PreEmptive Analytics Workbench User Guide

Known Issues

Administrative issues

Computation Service fails to start (rarely)

After rebooting the Workbench host, very rarely the Computation Service will fail to start, because it doesn't start within Windows default timeout. This causes inbound data to queue in the Workbench endpoint queue in RabbitMQ. This is most-often an issue on VMs with poor disk performance.

To mitigate this, the Computation Service is configured with "Delayed Start". This means that Windows won't start the Computation Service until a few minutes after the OS starts, so if you just rebooted, it may start on its own in a few minutes. You can check the Event Log to see if it has already attempted to start. You can also start it manually.

If you encounter this issue, you can work around it by increasing the value in the registry setting HKLM\SYSTEM\CurrentControlSet\Control\ServicesPipeTimeout and re-configuring the windows service to start normally.

"Delayed Start" setting doesn't appear until first host reboot

After the initial install, the Computation Service has been configured with "Delayed Start" but it will not appear so in the Services control panel. This is a Windows / installer issue, and after the first reboot the Computation Service will appear to be configured for Delayed Start.

Computation Service fails to stop (rarely)

When manually stopping the Computation Service, or when it is automatically stopped during an uninstall, very rarely the service will fail to stop in a normal amount of time. This can result in an administrator-visible error, or in an ongoing failure to stop, or in a hung uninstaller. This generally only happens if the service is processing a large queue of incoming messages.

If the Computation Service fails to stop, it is safe to kill it manually via Task Manager.

Slow initial startup / access times

Immediately after installation, a machine reboot, or a long idle period (default 20 minutes), individual features of the Workbench can take an unusually-long time to respond (~10 seconds). This is because IIS does not load the services that comprise the Workbench until they are used for the first time, and it takes a few extra seconds for the service to load.

This is noticeable in the following ways:

  • The first message sent to the Workbench (via the Data Hub) after the Workbench is installed will not appear on the Portal for approximately 90 seconds, due to the Workbench starting up, and the Data Hub caching the message and retrying it 60 seconds later.
  • The first Portal access can be unusually slow, especially during the "Fetching metadata" portion of Portal startup.

Administrative account issues

The Portal configuration files are all contained in the application's install folder, so editing them requires that the editor application be run with administrative rights.

The installed Start menu links to the Workbench Portal and the Workbench Documentation won't work on Windows 8 if the user is an Administrator and has Internet Explorer (Metro) configured as their default web browser. This is by design in Windows 8, but can be worked around by setting a different default browser or by configuring Internet Explorer to always use the Desktop version.

Query API issues

The Query API has been documented thoroughly, but future versions of the Workbench may break backwards-compatibility with the current version of the API. Specifically:

  1. Query (and metadata query) responses include extra properties that should not be present, and which will be removed in future versions.
  2. Some queries require setting fields to a specific value (e.g. a server-side class name) that has no alternate options; such fields may be removed in future versions.
  3. Some queries will not work correctly unless certain fields are requested as aggregated fields, but there is no way to programmatically identify those fields. Future versions will include a way to identify those fields.
  4. Some queries (i.e. Custom Data, Tamper) will not return any data if certain fields (i.e. OS, Runtime, Country) are requested as aggregated fields, but there is no way to programmatically identify those cases. Future versions will either change this behavior, or will remove those fields, or will include a way to identify those fields.
  5. The metadata query includes a list of all possible {company,application,version} tuples; that data may be moved to a separate query in a future release.

None of these issues will affect custom indexer, pattern, or Server Query development, nor will they affect portal configuration. They will only affect custom client applications that directly access data via the API.

Replaying from Standalone Repository version 1.0

To ensure that temporary storage does not grow too large during a replay scenario, the Workbench temp-bin cleanup relies on an HTTP header, X-PreEmptive-ReceiveDate, to be set when envelopes are stored by the Standalone Repository. However, Standalone Repository version 1.0 (and its accompanying Replayer) do not support this header, so replaying from such a repository into Workbench 1.1 or later would cause temporary storage to grow dangerously large, consuming disk space at a rapid rate and reducing ingestion performance significantly. As a result, the Workbench endpoint will not accept envelopes from a replay of a 1.0 Standalone Repository. Please upgrade your Standalone Repository and Replayer to version 1.1, after which the replay will succeed.

User-visible issues

"0" reported for Length fields, incorrectly

In certain situations, the Query API (used by the Portal, primarily) will report "0" for a session or feature length, where actually there was no data. This happens when there are no complete sessions on a day, but there are other incomplete sessions. This is most-apparent in the Portal Key Stats table, where the Min Length will be reported as 0 because that's the smallest value reported by the server. This issue will tend to disappear as the dataset gets larger.

The Users by Day chart may have different numbers than the Key Stats summary due to different aggregation

The Users by Day chart for unique and returning users aggregates across all filtered applications per day, while the Key Stats table aggregates across the entire selected period. This means that users who return across multiple days will be counted as unique only once in the Key Stats table, but will be counted as unique for each day in the chart. For instance, User A visiting on day 1 and day 2, with User B visiting only on day 2 will lead to Key Stats showing two unique users, with one returning (so a total of 2 users). The Users by Day graph will show one unique user on day 1 and two on day 2 (so a total of 3).

"Unknown" countries

The Sessions by Country table on the Overview report may show "Unknown" for the country, when testing. This can happen if the source data is coming from an internal-only IP address (e.g. 10.0.x.x, 192.168.x.x, etc.). Data sent over the public internet should be associated with the appropriate country.

The Users by Day chart may appear to render inaccurately

If there is only a single day of data shown in the Users by Day chart, it can look like it isn't plotting the data correctly. This happens because when it is only showing a single day, it appears to be a normal line chart, but it is actually a stacked (area) chart and the data is actually rendered correctly. As soon as a second day of data becomes available, the rendering makes sense.

Patterns are not applied to dynamic output schemas and Custom Data Filters

Patterns are not applied to dynamic output schemas, which includes those created by Custom Data Filters. In the default set of reports, this means that the Custom Data report cannot be filtered by OS, Runtime, or Country.

OS, Runtime, and Country data is not applied to Tamper or Debug Check data

The default pattern that collects OS, Runtime, and Country data does not collect that data for Tamper and Debug Check messages (from e.g. an application run through Dotfuscator). This is because such messages can be generated and sent outside of normal session scope, and don't have the corresponding OS, Runtime, and Country data available.

In the default set of reports, this means that the Tamper and Debug Check reports cannot be filtered by OS, Runtime, or Country.

Charts don't show missing days as expected

The charts in the Workbench will often obscure the fact that no data exists for a particular day (or hour), because they don't plot that missing data as an empty data point. Instead, they just plot the data points that are provided by the server.

For example, if the Exceptions by Day chart has data for Tuesday and Thursday, but not Wednesday, it will plot a line between Tuesday's point and Thursday's point, with no clear indication that Wednesday's point is not part of that line. Similarly for the Session Times chart, if no data exists for a particular hour of the day, that hour will be omitted from the chart.

New, Unique, and Returning Users calculation issues

The Workbench uses an advanced algorithm called HyperLogLog to keep track of unique user ID counts. HyperLogLog is an estimation algorithm, so it does not have perfect accuracy. In practice the inaccuracy is rarely noticed, and when the dataset is small the numbers often end up perfectly accurate anyway. However, it is possible to see "off by one" situations with small datasets, or see seemingly-impossible data (i.e. more New users than Unique users), but within the margin of error.

New vs. returning users are determined by using a BloomFilter which has to be pre-configured with a "capacity" (i.e. the expected max number of unique users). This has a default setting of 100K per application/version combination (so 100K for ACME App 1.0, and another 100K for ACME App 1.0.1). On average, the bloom filter will be perfectly accurate up to about 70% of the expected users and have 2% inaccuracy at the expected users. However, this inaccuracy rapidly increases, and at 150K unique users it may be 20% or more inaccurate.

If you expect more than 100K unique users per application-version, please change the configuration setting ExpectedUniqueUsersPerApplication to a value roughly 50-100% larger than you expect. This should ensure that the user counts remain accurate. This setting only affects application-versions that haven't been seen before, so we strongly recommend it is set as soon as possible. If the number of users is beyond capacity and you wish to fix the situation, you must purge and replay the Workbench. The Workbench log will report a warning when the filter is approaching its capacity, and will report an error when the inaccuracy increases beyond 5%.

Setting the ExpectedUniqueUsersPerApplication to a larger setting will cause larger disk usage, at the rate of approximately 1 byte per expected user. This means the default setting takes 100KB of space per application-version and a setting of 1M expected users will take 1MB per application-version.

Another nuance to this implementation is that the determination that a user ID is "new" happens the moment the Workbench receives the first message with that user ID, so the Workbench assigns the date of that message (i.e. when it was generated) as the day that a new user appeared. This can be inaccurate if the messages arrived at the server out-of-order. For example, if the user started using the app on Tuesday, and also used it on Wednesday, but then for some reason Wednesday's messages arrived at the server before Tuesday's, then the user will be counted as "new" on Wednesday.

Unique Feature calculation issues

The Workbench also uses a Bloom Filter to keep track of unique features, so the same behavior and caveats listed above will apply. The default setting is 1K unique features per session (not per application/version like the user filter) and can be changed through modifying ExpectedMaxFeaturesPerSession in the config. Again, this setting will only apply to future sessions and the log will contain warnings/errors to reflect sessions which have too many unique features.

User counts may be inaccurate when viewing queries with Workbench pre-1.1 data

Versions of the Workbench prior to 1.1 had misconfigured HyperLogLog counters (see above) that led to inaccuracy in user counts. Beginning in version 1.1 this has been addressed through a configurable error rate HyperLogLogError set to 0.03 (3%) by default. This setting ensures sufficient accuracy for most scenarios, but may be adjusted if desired.

HyperLogLog counters of different accuracies can be successfully merged, however the accuracy of the merge is equal to the least accurate counter. In practice, this means that queries which contain pre-1.1 data will retain high error levels even when overlapping with newer data. In order to benefit from increased accuracy, the query range as a whole must contain only 1.1+ processed data.

If this behavior is insufficient, then the Workbench database should be purged and a full replay performed.

Lack of OS, Runtime or Location data affects data drill down

OS, Runtime and Location data are sent with session start events. If the message for the session has not been received -- or the OS/Runtime/Location data cannot be determined from the message -- then no OS, Runtime, or Location drill down information will be available for that session. This means that the sum of the sessions in tables at the bottom of the Overview page will not correspond with the total number of sessions listed in the Key Statistics table. In addition, the information from that Session cannot be pivoted on.

Prior versions of the Workbench displayed "Unknown" values when data was missing, and upgraded Workbenches will continue to display those sessions. If this is not desired, the Workbench can be reset and data Replayed.

Exceptions may be considered the same if they share the same stack trace

Depending on ingestion timing issues, multiple reported exceptions may be considered the same exception (e.g., on the default Exceptions report) if they have the same stack trace, but differ by exception type, message, and/or whether the exception was Caught, Thrown, or Unhandled.

Upgraded Exception Data has Inconsistent Stack Traces with Newly-Received Exception Data

In versions of the Workbench prior to 1.2, certain kinds of reported exceptions would lose their stack trace signature information upon being ingested by the Workbench.

While this issue is fixed in Workbench 1.2, previously-ingested data cannot be fixed by the upgrade process. This means that exception information received before the upgrade will still lack stack trace signatures, and exception information received after the upgrade will correctly contain the full signature information. As a result, the same exception may be reported twice in the Portal: once with an incomplete stack trace (counting occurrances received before the upgrade), and once with a complete stack trace (counting occurrances received after the upgrade).

If this behavior is undesirable, then the Workbench database should be purged and a full replay performed.

"Copy and Close" falls back to "Close" without Flash

The 'Share' feature of the Portal provides a dialog with a "Copy and Close" button that only works if Flash is installed in the user's browser. If it is not installed, the dialog just has a "Close" button.



Workbench Version 1.2.0. Copyright © 2016 PreEmptive Solutions, LLC