PreEmptive Analytics Data Hub User Guide

Troubleshooting

This section can help you resolve common issues encountered with the Data Hub.

Installation

If issues are encountered during installation, run the installer again with logging turned on:

  1. Open a command prompt.
  2. Run the installer with the arguments: /L*V "<logfile name>".
    • PreEmptive.Analytics.Data.Hub.exe /L*V "install_log.txt"
  3. Check the log file for errors and contact support if necessary.

Delivery Issues

If an issue prevents a message from being accepted by a destination, the Data Hub categorizes these issues as offline or error responses, depending on the specific scenario. This section offers suggested steps for discovering the type of problem that is preventing message delivery.

Durable Destinations

When dispatching to durable destinations, the Dispatch Service will queue unsuccessful messages for an offline retry or [error retry][error retry] at a later period. This prevents data loss while the issue with the destination is being diagnosed, though if not resolved, this queuing may eventually lead to disk-full or memory-full scenarios.

Typically, such situations should be diagnosed by examining the Event Log for messages that will explain the response received from the destination. You may also want to monitor the relevant WMI counters to ensure the messages are being received and routed as you expect them to be. If these tools are insufficient, you may need to use an HTTP proxy to trace the actual HTTP request and response, or directly view the queued message(s) that are causing the problem.

Using an HTTP Proxy

The Dispatch Service can be configured to use an HTTP proxy (e.g. Fiddler) to see the exact requests and responses that are transmitted between the the Data Hub and a destination.

Before enabling an HTTP proxy to troubleshoot message retries, you may want to:

  • Reduce the retry intervals to a few minutes (or less). In particular, the error retry interval is 6 hours by default, so being able to capture the retry through the proxy would be very difficult at the default value.
  • Temporarily disable dispatching from the endpoint queue, making all HTTP requests be driven by retries, not new messages. This can be done by setting the <dispatchFromEndpoint> setting to false.
    • Once the issue is resolved, you must remember to revert this setting to true.

To enable such a proxy:

  1. Open [Application folder]\DispatchService\HubDispatchService.exe.config.
  2. Under the <system.net> section, add the following, replacing the proxyaddress attribute appropriately:
    • <defaultProxy enabled="true"> <proxy proxyaddress="http://127.0.0.1:8888" bypassonlocal="False"/> </defaultProxy>
  3. Save the file.
  4. Restart the Dispatch Service.

To disable such a proxy:

  1. Open [Application folder]\DispatchService\HubDispatchService.exe.config.
  2. Under <system.net> section, remove the <defaultProxy> element.
  3. Save the file.
  4. Restart the Dispatch Service.

Viewing Queued Messages

In some cases, inspecting the contents of the messages at the front of a queue may be sufficient to understand an issue.

To view queued messages:

  1. Open the RabbitMQ management console's queue page as a user with administrative privileges.
  2. Click on the queue name in question to open that queue's detail page.
    • pahub.<destination id>_offline for offline responses.
    • pahub.<destination id>_errored for error responses.
  3. Locate the Get messages section. Expand if necessary.
  4. Choose Yes for the Requeue field. Incorrectly choosing No will cause data loss.
  5. Enter the number of messages to retrieve in the Messages field.
    • Large numbers of messages (e.g., over 1000) may cause the management console to freeze up.
  6. Click Get Message(s).
  7. Inspect the messages that appear.

Non-Durable Destinations

When dispatching to non-durable destinations, the Dispatch Service only tries messages once, and never queues messages. Because messages that receive error or offline responses are never retried, it can be harder to troubleshoot problems with non-durable destinations.

Some suggestions when dealing with non-durable destinations:

  • Examine the Event Log for messages related to this destination.
  • Monitor the various WMI counters related to this destination.
  • If an issue is discovered, consider making the destination durable, at least temporarily, to queue and retry problematic messages.
  • Configure a proxy to see the requests and responses between the Dispatch Service and this destination.

Disk-full Scenarios

In the event that disk space reaches a critical level, the Data Hub will no longer be able to queue incoming messages, causing an outage at the Endpoint.

First, in such a situation, check the Queues page of the RabbitMQ management console to see which (if any) queues are excessively large. Based on that information, there are three possible outcomes:

  1. None of the queues are large.
  2. The endpoint queue is very large.
  3. One or more of the destination queues are very large.

1. None of the queues are large

  1. If this is the case, check the queue sizes in the WMI counters (PA Data Hub -> Endpoint Queue Size; PA Data Hub Dispatch Destinations by Source Queue -> Error Queue Size / Offline Queue Size) to confirm.
  2. If they confirm that the queues are small, check your logging settings and the log files / Event Log to see if they are consuming the disk space.
  3. If that's not the issue, the problem may lie outside the Data Hub components.

2. The endpoint queue is very large

This typically means that new messages are not being processed by the Dispatch Service. This could be because it is off, because it is not configured with any destinations, or because it has been set not to dispatch messages from the endpoint queue.

Dispatch Service is not running

If the Dispatch Service is not running, all incoming messages will remain queued in the endpoint queue, filling the disk.

To check for this scenario:

Ways to resolve:

No destinations are configured

If the Dispatch Service is running but has no configured destinations, all incoming messages will remain queued in the endpoint queue, filling the disk.

To check for this scenario:

To resolve:

Dispatching from the endpoint queue is disabled

For troubleshooting purposes, it is possible to start the Dispatch Service, but not enable messages from the endpoint queue to be dispatched, instead only permitting retries.

To check for this scenario:

To resolve:

3. One or more of the destination queues are very large

This typically means that a destination has been offline for a long time, or that the destination is returning error responses for most messages.

Destination has been offline for a long time

If a destination is offline, messages will remain queued for that destination until it comes online again. If a destination is offline for an extended period, these messages will build up and may eventually fill the disk.

To check for this scenario:

  • Check if any instances of the Destination Status counter are 0 for an extended period of time, aside from _all, _durable, and _non_durable.
  • Check the Event Log for details on the specific reason the destination is offline.

Ways to resolve:

  • Resolve the issue with the destination, or with the connection between the Data Hub and the destination.
  • Consider temporarily changing the URL of the destination to that of another Data Hub with more free disk space.
  • To prevent incoming messages from being queued for this destination, mark the destination as non-durable. This will cause all incoming messages to be tried once, then dropped if they are not accepted. Any queued messages will remain queued. This means new messages will never be delivered to this destination, until it comes online again.
  • To prevent incoming messages from being attempted for this destination, while allowing already queued messages to be retried, set the exclude criteria to <all/>. This means new messages will never be delivered to this destination.
  • Remove some or all of the messages from the offline queue for the destination (i.e., pahub.<destination id>_offline). This causes data loss.

Destination is rapidly returning error responses

If a large number of messages for a particular destination are receiving error responses, there may be a larger problem with the destination or the connection between the Data Hub and the destination.

To check for this scenario:

Ways to resolve:

  • Resolve the issue with the destination, or with the connection between the Data Hub and the destination.
  • Reduce the value of the <errorGiveup> setting, so the Data Hub will discard the oldest messages from the error queue. Note that you will probably also want to reduce the <errorRetry> setting (to e.g. 5 minutes) so that the queue is processed quickly.
  • Consider temporarily changing the URL of the destination to that of another Data Hub with more free disk space.
  • To prevent incoming messages from being queued for this destination, [mark the destination as non-durable][destinations. This will cause all incoming messages to be tried once, then dropped if they are not accepted. Any queued messages will remain queued. This means new messages that receive error responses will never be delivered to this destination.
  • To prevent incoming messages from being attempted for this destination, while allowing already queued messages to be retried, set the exclude criteria to <all/>. This means new messages will never be delivered to this destination.
  • Remove some or all of the messages from the error queue for the destination (i.e., pahub.<destination id>_errored). This causes data loss.

Memory-full Scenarios

The Data Hub is designed to run well within memory limits on any host with the minimum memory recommendation but under certain scenarios (e.g. very large message sizes) it is possible for it to consume all the memory or for individual components to hit their internal memory limits. In such cases, throughput may be reduced and in extreme cases it can cause an outage at the endpoint.

To see what component is using the most memory, open Task Manager (taskmgr.exe). Check the following processes and refer to the associated sections for assistance:

  1. The erl.exe process: RabbitMQ.
  2. The PreEmptive Analytics Data Hub Dispatch Service or HubDispatchService.exe process: Dispatch Service.
  3. An IIS Worker Process or w3wp.exe process: Endpoint Web Service.

If none of the processes appear to be particularly memory-intensive, check the RabbitMQ logs ([RabbitMQ data folder]\log) for memory alarms. If they appear, see the RabbitMQ subsection for assistance.

If none of the above appear to be abnormal, the problem may lie outside of the Data Hub components.

1. RabbitMQ

If the Erlang process (erl.exe) is using excessive memory (more than 1GB) or if the log ([RabbitMQ data folder]\log) shows memory alarms, then RabbitMQ is likely experiencing problems.

Ways to resolve:

Then, re-check the memory usage of all components.

2. Dispatch Service

If the Dispatch Service is using excessive memory (more than 1GB), it is probably a temporary issue related to message size and data rate. If necessary, restart the Dispatch Service. Then, re-check the memory usage of all components.

If a more permanent solution is needed, try reducing the maximum outgoing connections and Dispatch Service concurrency settings.

3. Endpoint Web Service

If a IIS Worker Process is using more memory than expected (more than 500MB), it is probably a temporary issue related to message size and data rate. To resolve, restart the Endpoint Web Service - this will cause an outage on the Endpoint. Then, re-check the memory usage of all components.



Data Hub Version 1.5.0. Copyright © 2015 PreEmptive Solutions, LLC