Administration & Troubleshooting
Recall from the Data-Processing Overview that indexers can save message data within temporary storage so that it can be combined with later messages to generate new data. To prevent this storage from growing indefinitely, the Workbench periodically removes bins that are considered outdated with respect to messages most recently processed.
Settings for this process may be adjusted by editing the keys in
<Installation Folder>\Windows\Computation Service\WorkbenchComputationService.exe.config, and restarting the Computation Service:
<appSettings> <add key="StaleTime" value="30.00:00:00"/> <!-- Days.HH:MM:SS --> <add key="GcCollectEvery" value="1000"/> <add key="GcMaximumCollectionInterval" value="01:00:00"/> <add key="GcRollingCount" value="5000"/> </appSettings>
StaleTimeis the age a bin needs to be, relative to the recently processed messages, before it is removed. The default is
GcCollectEveryis the number of processed messages that will trigger a cleanup. The default is
GcMaximumCollectionIntervalis the amount of time that will trigger a cleanup. The default is
GcRollingCountis the number of messages to consider "recent". The default is
The temp-bin cleanup process stops ingestion by the Computation Service in order to prevent an invalid database state. By default, this happens every 1000 messages processed, or every hour, whichever comes first. The frequency of these cleanups can be configured.
The process is triggered when a batch is processed from the endpoint queue and, since the last cleanup or the start of the Computation Service:
GcCollectEverymessages have been processed, or
Increasing these values will decrease the frequency of cleanups, but may increase the amount of data that needs to be removed in each cleanup process (as more messages will be processed between cleanup triggers).
In general, a bin is eligible for removal if the time of its last update was more than
StaleTime ago. By default, bins that have not been updated within the last 30 days will be removed. For most customers, this default is suitable,
but some administrators may need to adjust
StaleTime to fit their use cases.
If this Workbench will receive an exceptionally large amount of data, reducing the
StaleTime setting will decrease disk usage and increase ingestion performance.
However, reducing this value too much may lead to data from multiple messages not correctly being correlated. Consider the case where
15.00:00:00. If a session sends a Session.Start message, sends no data
for 20 days, and then sends a Session.Stop message, the information from the Session.Start will be removed by the time the Session.Stop arrives, thus leading to 2 incomplete sessions being recorded, rather than 1 complete one.
Note: For the purposes of temp-bin cleanup, when a child bin is modified, its ancestors are considered modified as well. In the previous scenario, if feature messages from that session had been regularly received within the 20 days, the session's last-updated time would continue to be updated as well, preventing the Session.Start information from being removed prematurely.
StaleTime will lead to more disk usage and reduced ingestion performance. We do not recommend increasing the
StaleTime beyond the default unless:
The cleanup process is designed to accommodate ingestion scenarios from both real-time data and data from a replay (from the PreEmptive Analytics Standalone Repository & Replayer), and return the same results no matter which way it arrived. This is because the dates the cleanup process uses are based on when envelopes first arrived at any PreEmptive Analytics Suite product, rather than the server time at which these envelopes were processed.
This date is transmitted via the
X-PreEmptive-ReceiveDate HTTP Header. Because Standalone Repository version 1.0 did not support this header, the Workbench endpoint will not accept envelopes of a replay from a 1.0 Standalone Repository (see
the Known Issue).