Administration & Troubleshooting
After rebooting the Workbench host, very rarely the Computation Service will fail to start, because it doesn't start within Windows default timeout. This causes inbound data to queue in the Workbench endpoint queue in RabbitMQ. This is most-often an issue on VMs with poor disk performance.
To mitigate this, the Computation Service is configured with "Delayed Start". This means that Windows won't start the Computation Service until a few minutes after the OS starts, so if you just rebooted, it may start on its own in a few minutes. You can check the Event Log to see if it has already attempted to start. You can also start it manually.
If you encounter this issue, you can work around it by increasing the value in the registry setting
HKLM\SYSTEM\CurrentControlSet\Control\ServicesPipeTimeout and re-configuring the windows service to start normally.
After the initial install, the Computation Service has been configured with "Delayed Start" but it will not appear so in the Services control panel. This is a Windows / installer issue, and after the first reboot the Computation Service will appear to be configured for Delayed Start.
When manually stopping the Computation Service, or when it is automatically stopped during an uninstall, very rarely the service will fail to stop in a normal amount of time. This can result in an administrator-visible error, or in an ongoing failure to stop, or in a hung uninstaller. This generally only happens if the service is processing a large queue of incoming messages.
If the Computation Service fails to stop, it is safe to kill it manually via Task Manager.
Immediately after installation, a machine reboot, or a long idle period (default 20 minutes), individual features of the Workbench can take an unusually-long time to respond (~10 seconds). This is because IIS does not load the services that comprise the Workbench until they are used for the first time, and it takes a few extra seconds for the service to load.
This is noticeable in the following ways:
The Portal configuration files are all contained in the application's install folder, so editing them requires that the editor application be run with administrative rights.
The installed Start menu links to the Workbench Portal and the Workbench Documentation won't work on Windows 8 if the user is an Administrator and has Internet Explorer (Metro) configured as their default web browser. This is by design in Windows 8, but can be worked around by setting a different default browser or by configuring Internet Explorer to always use the Desktop version.
The Query API has been documented thoroughly, but future versions of the Workbench may break backwards-compatibility with the current version of the API. Specifically:
None of these issues will affect custom indexer, pattern, or Server Query development, nor will they affect portal configuration. They will only affect custom client applications that directly access data via the API.
To ensure that temporary storage does not grow too large during a replay scenario, the Workbench temp-bin cleanup relies on an HTTP header,
X-PreEmptive-ReceiveDate, to be set when envelopes
are stored by the Standalone Repository. However, Standalone Repository version 1.0 (and its accompanying Replayer) do not support this header, so replaying from such a repository into Workbench 1.1 or later would cause temporary storage to
grow dangerously large, consuming disk space at a rapid rate and reducing ingestion performance significantly. As a result, the Workbench endpoint will not accept envelopes from a replay of a 1.0 Standalone Repository. Please
upgrade your Standalone Repository and Replayer to version 1.1, after which the replay will succeed.
In certain situations, the Query API (used by the Portal, primarily) will report "0" for a session or feature length, where actually there was no data. This happens when there are no complete sessions on a day, but there are other incomplete sessions. This is most-apparent in the Portal Key Stats table, where the Min Length will be reported as 0 because that's the smallest value reported by the server. This issue will tend to disappear as the dataset gets larger.
The Users by Day chart for unique and returning users aggregates across all filtered applications per day, while the Key Stats table aggregates across the entire selected period. This means that users who return across multiple days will be counted as unique only once in the Key Stats table, but will be counted as unique for each day in the chart. For instance, User A visiting on day 1 and day 2, with User B visiting only on day 2 will lead to Key Stats showing two unique users, with one returning (so a total of 2 users). The Users by Day graph will show one unique user on day 1 and two on day 2 (so a total of 3).
The Sessions by Country table on the Overview report may show "Unknown" for the country, when testing. This can happen if the source data is coming from an internal-only IP address (e.g.
etc.). Data sent over the public internet should be associated with the appropriate country.
If there is only a single day of data shown in the Users by Day chart, it can look like it isn't plotting the data correctly. This happens because when it is only showing a single day, it appears to be a normal line chart, but it is actually a stacked (area) chart and the data is actually rendered correctly. As soon as a second day of data becomes available, the rendering makes sense.
Patterns are not applied to dynamic output schemas, which includes those created by Custom Data Filters. In the default set of reports, this means that the Custom Data report cannot be filtered by OS, Runtime, or Country.
The default pattern that collects OS, Runtime, and Country data does not collect that data for Tamper and Debug Check messages (from e.g. an application run through Dotfuscator). This is because such messages can be generated and sent outside of normal session scope, and don't have the corresponding OS, Runtime, and Country data available.
In the default set of reports, this means that the Tamper and Debug Check reports cannot be filtered by OS, Runtime, or Country.
The charts in the Workbench will often obscure the fact that no data exists for a particular day (or hour), because they don't plot that missing data as an empty data point. Instead, they just plot the data points that are provided by the server.
For example, if the Exceptions by Day chart has data for Tuesday and Thursday, but not Wednesday, it will plot a line between Tuesday's point and Thursday's point, with no clear indication that Wednesday's point is not part of that line. Similarly for the Session Times chart, if no data exists for a particular hour of the day, that hour will be omitted from the chart.
The Workbench uses an advanced algorithm called HyperLogLog to keep track of unique user ID counts. HyperLogLog is an estimation algorithm, so it does not have perfect accuracy. In practice the inaccuracy is rarely noticed, and when the dataset is small the numbers often end up perfectly accurate anyway. However, it is possible to see "off by one" situations with small datasets, or see seemingly-impossible data (i.e. more New users than Unique users), but within the margin of error.
New vs. returning users are determined by using a BloomFilter which has to be pre-configured with a "capacity" (i.e. the expected max number of unique users). This has a default setting of 100K per application/version combination (so 100K for ACME App 1.0, and another 100K for ACME App 1.0.1). On average, the bloom filter will be perfectly accurate up to about 70% of the expected users and have 2% inaccuracy at the expected users. However, this inaccuracy rapidly increases, and at 150K unique users it may be 20% or more inaccurate.
If you expect more than 100K unique users per application-version, please change the configuration setting
ExpectedUniqueUsersPerApplication to a value roughly 50-100% larger than you expect. This should ensure that the user counts
remain accurate. This setting only affects application-versions that haven't been seen before, so we strongly recommend it is set as soon as possible. If the number of users is beyond capacity and you wish to fix the situation, you must
purge and replay the Workbench. The Workbench log will report a warning when the filter is approaching its capacity, and will report an error when the inaccuracy increases beyond 5%.
ExpectedUniqueUsersPerApplication to a larger setting will cause larger disk usage, at the rate of approximately 1 byte per expected user. This means the default setting takes 100KB of space per application-version and
a setting of 1M expected users will take 1MB per application-version.
Another nuance to this implementation is that the determination that a user ID is "new" happens the moment the Workbench receives the first message with that user ID, so the Workbench assigns the date of that message (i.e. when it was generated) as the day that a new user appeared. This can be inaccurate if the messages arrived at the server out-of-order. For example, if the user started using the app on Tuesday, and also used it on Wednesday, but then for some reason Wednesday's messages arrived at the server before Tuesday's, then the user will be counted as "new" on Wednesday.
The Workbench also uses a Bloom Filter to keep track of unique features, so the same behavior and caveats listed above will apply. The default setting is 1K unique features per session (not per application/version like the user filter) and can
be changed through modifying
ExpectedMaxFeaturesPerSession in the config. Again, this setting will only apply to future sessions and the log will contain warnings/errors to reflect sessions which have too many unique features.
Versions of the Workbench prior to 1.1 had misconfigured HyperLogLog counters (see above) that led to inaccuracy in user counts. Beginning in version 1.1 this has been addressed through a configurable error rate
to 0.03 (3%) by default. This setting ensures sufficient accuracy for most scenarios, but may be adjusted if desired.
HyperLogLog counters of different accuracies can be successfully merged, however the accuracy of the merge is equal to the least accurate counter. In practice, this means that queries which contain pre-1.1 data will retain high error levels even when overlapping with newer data. In order to benefit from increased accuracy, the query range as a whole must contain only 1.1+ processed data.
If this behavior is insufficient, then the Workbench database should be purged and a full replay performed.
OS, Runtime and Location data are sent with session start events. If the message for the session has not been received -- or the OS/Runtime/Location data cannot be determined from the message -- then no OS, Runtime, or Location drill down information will be available for that session. This means that the sum of the sessions in tables at the bottom of the Overview page will not correspond with the total number of sessions listed in the Key Statistics table. In addition, the information from that Session cannot be pivoted on.
Prior versions of the Workbench displayed "Unknown" values when data was missing, and upgraded Workbenches will continue to display those sessions. If this is not desired, the Workbench can be reset and data Replayed.
Depending on ingestion timing issues, multiple reported exceptions may be considered the same exception (e.g., on the default Exceptions report) if they have the same stack trace, but differ by exception type, message, and/or whether the exception was Caught, Thrown, or Unhandled.
In versions of the Workbench prior to 1.2, certain kinds of reported exceptions would lose their stack trace signature information upon being ingested by the Workbench.
While this issue is fixed in Workbench 1.2, previously-ingested data cannot be fixed by the upgrade process. This means that exception information received before the upgrade will still lack stack trace signatures, and exception information received after the upgrade will correctly contain the full signature information. As a result, the same exception may be reported twice in the Portal: once with an incomplete stack trace (counting occurrances received before the upgrade), and once with a complete stack trace (counting occurrances received after the upgrade).
If this behavior is undesirable, then the Workbench database should be purged and a full replay performed.
The 'Share' feature of the Portal provides a dialog with a "Copy and Close" button that only works if Flash is installed in the user's browser. If it is not installed, the dialog just has a "Close" button.