Please review the requirements and recommendations below before installing the Data Hub.
The Data Hub can be run on either physical hardware or a virtual machine. Below are suggested specifications for either type of host.
In normal operation, the Data Hub requires very little free disk space on its host, because all data is delivered as soon as it is received. In scenarios where downstream destinations are offline or returning errors, though, the Data Hub needs to have enough free disk space to store all messages that are received (for all offline/error destinations) until they come back online. The amount of disk space required depends directly on the rate at which messages come in, and the size of those messages.
You can estimate the maximum disk space needed for queued data (in bytes) using the following formula:
offlineDuration * durableDestCount * incomingRate * messageSize
offlineDurationis the longest time you want to be able to queue messages while a downstream destination is offline, in seconds.
durableDestCountis the number of durable destinations to be configured, that will receive all messages.
incomingRateis the average incoming message rate, in messages per second.
messageSizeis the average incoming message size, in bytes per message.
For example, the minimum recommendation is calculated assuming 2 durable destinations, assuming they could both be offline for up to a week:
1 week (604800 seconds) * 2 destinations * 10 messages/second * 4000 bytes/message
which is approximately 48 GB for queued data.
The Data Hub requires a 64-bit OS from the following list:
It also depends on the following Windows features, which must be installed in this order:
In normal operation, with average message size of 4KB and incoming message rate under 200 messages/second, Data Hub requires very little memory (less than 1GB). In extreme cases with larger message sizes and higher incoming message rate, using the default Dispatch Service Concurrency Settings, we have found that:
These tests were run with these specifications and hardware configuration.
In order to receive analytics data, the Data Hub must be deployed in a location that is reachable by instrumented applications and other upstream clients, now and in the future. If the instrumented applications communicate over the internet, rather than an internal network, the Data Hub should be placed in a DMZ.
For similar reasons, the external hostname of the machine should be stable; otherwise, previously instrumented applications pointing to that location would fail to deliver their messages.
For most use-cases, any host meeting the minimum system requirements can handle any normal rate or volume of data. In rare cases, relating to destination throughput, you may need to tune runtime performance settings of the Data Hub; please see the Performance Tuning page for details.
If you anticipate having very high rates of throughput (e.g. 200+ messages per second), we recommend choosing a RabbitMQ data folder location on a secondary (non-OS) drive. We have also found that an SSD on the OS drive can improve throughput. We have not found a significant throughput improvement from using an SSD for the RabbitMQ drive.
PreEmptive Solutions has placed the Data Hub through extensive performance with the following host configuration:
This host was able to accept and successfully deliver an overall throughput of 500 messages/second incoming (4KB messages), delivered to multiple destinations, under a variety of runtime conditions (e.g. offline destinations, error responses, large queues, high-latency destinations, etc.), without completely consuming any system resource (e.g. CPU, memory, disk, network).
Note that this test was done over an internal network; if it had been done over the Internet, network bandwidth would likely have limited the overall throughput.