PreEmptive Analytics Workbench User Guide

Custom Indexers

The Workbench includes a number of default Indexers that extract the most-commonly-used data (fields) from incoming analytics messages, and make them available for Server Queries. You may come across situations where the data provided by the default Plugins isn't sufficient for your goals; in such a case, you probably need a custom indexer (and a custom Server Query).

This page walks through a Sample Indexer (installed as part of the Sample Plugin) to help you understand the design considerations that went into creating it, so that you can learn to build your own custom Indexers.

Goal

The sample indexer counts application runs by how much memory the client device has available, in three categories:

  • less than 1 GB,
  • 1 - 4 GB, and
  • more than 4 GB.

This assumes the instrumented application generates PerformanceProbe messages.

Step 1: Choose Scope and Parent Scope(s)

The first step of building an indexer is to choose which temp-bin scope to use. Recall the four different scopes described on the Data-Processing Overview page.

In our sample Indexer, we want to count application runs. One of the default indexers is the Application Run Indexer; it already counts application runs and it uses the Application Run Scope. We want to use this scope as well, so that we get access to the same temporary bin as the Application Run Indexer and all its data.

Indexers subclass the PreEmptive.Workbench.Interfaces.Indexing.IndexerBase class. We start off by creating a SampleIndexer class that inherits appropriately, and call the base constructor, which takes the following arguments:

  • fieldKeyFactory: A field-creation utility to the IndexerBase class. In most cases, you will not need to interact directly with this argument.
  • defaultNamespace: The Namespace of this Indexer. This must be unique among all installed Indexers.
  • scope: An IBinScope instance, which dictates what kind of temp-bin this Indexer operates on. We pass in an instance of ApplicationRunScope to indicate our temp-bins should contain information pertaining to an application run.
  • parentScopes: Optionally, parent IBinScope instances. We won't be defining Parent Scopes in this example - see the Advanced Page for an example of this option.

We merely need to state our dependencies in the SampleIndexer constructor; the Workbench's dependency injection system will automatically construct the Indexer with the appropriate arguments. Additionally, our Indexer will also be performing diagnostic logging, so our constructor also requests and stores an IFunctionalLogger instance.

public const string Namespace = "Sample";
private readonly IFunctionalLogger _logger;
public SampleIndexer(FieldKeyFactory fieldKeyFactory, ApplicationRunScope applicationRunScope, IFunctionalLogger logger)
    : base(fieldKeyFactory, Namespace, applicationRunScope)
{
    _logger = logger;
}

Step 2: Define Prerequisite Indexer(s) and Message Type(s) to extract

The next step is to think about the other Indexers and what message types this Indexer relies on.

We want to count the number of application runs, which are represented by ApplicationLifeCycle messages, and the amount of memory, which is provided in PerformanceProbe messages. The most direct thing to do would be to process both message types, and this Indexer would have all the information it would need to fulfill its calculation.

However, this would be a duplication of effort, because the Application Run Indexer, part of an included Plugin, already tracks application run information. Instead of copying the implementation of Application Run Indexer, we can use the same temp-bin as that Indexer and access the values it stores there. This way, we only need to process PerformanceProbe messages in our Indexer.

public override Type[] DefineMessageTypesToExtract()
{
    return new [] { MessageType.PerformanceProbe };
}

Note that the available message types that can be extracted are defined in PreEmptive.Workbench.Interfaces.Indexing.MessageType.

We also define the ApplicationRunIndexer as a prerequisite of our Indexer. This ensures that before our Indexer is used by the Workbench, the Application Run Indexer is also installed.

public override Type[] DefinePrerequisiteIndexers()
{
    return new[] { typeof(ApplicationRunIndexer) };
}

Note: This does not affect the order in which message processing occurs; it merely prevents our Indexer from activating if the Application Run Indexer's plugin is missing.

Step 3: Define Fields

This third step is to define the fields that you want this Indexer to add to the temporary bin, and, eventually, permanent storage.

In our sample indexer, the number of application runs needs to be grouped into three buckets based on amount of memory. So, we need to create a field to represent which memory bucket the run belongs to. Let's name our field MemoryBucket.

public const string MemoryBucket = "MemoryBucket";
protected override void DefineFields()
{
    DefineField(MemoryBucket, typeof(string), FieldType.PivotKey);
}

Recall the properties that define fields. For the MemoryBucket field, these are:

  • Namespace: "Sample" (automatic based on Indexer's namespace)
  • Name: "MemoryBucket"
  • Type: string, because we won't be doing numerical operations on the bucket labels
  • FieldType: PivotKey, because we want to be able to organize our application run data based on this field

And the default values for all other properties.

This indexer will also reference the number of Application Runs started (ApplicationRunStartCount in the ApplicationRun namespace), but it doesn't have to define it, because the Application Run Indexer already does.

Step 4: Extract Data

Next we implement the Extract method, which reads incoming analytics messages and writes information to temporary storage. This method is the main part of the aggregation pipeline's Extract phase.

In our sample indexer, we will extract memory information out of the PerformanceProbe messages, adjust it into one of three memory buckets, and store it in temporary bin under the MemoryBucket field.

public override ExtractResult Extract(IStateBin tempStateBin,
    EnvelopeAttributes envelopeAttributes,
    Message message)
{
    var performanceProbeMessage = message as PerformanceProbeMessage;
    if (null == performanceProbeMessage)
        return new ExtractResult(false);
    //Adjust memory into one of three buckets
    var availableMemoryInMb = performanceProbeMessage.MemoryMBAvailable;
    if (availableMemoryInMb < 1024)
        availableMemoryInMb = 0;
    else if (availableMemoryInMb < 4096)
        availableMemoryInMb = 1024;
    else availableMemoryInMb = 4096;
#if DEBUG
    _logger.Log("CustomSampleIndexer", "This is a sample logging message recording {Bytes}",
        null,
        LoggingLevel.Info,
        new KeyValuePair<string,object>("Bytes", availableMemoryInMb));
#endif
    tempStateBin.AddValue(GetFieldKey(MemoryBucket), availableMemoryInMb.ToString());
    return new ExtractResult(true);
}

The arguments taken by this method are:

  • tempStateBin, the temp-bin we will be modifying. Note the instance passed to this method will always be an Application Run temp-bin, because that is the scope we passed to IndexerBase in our Indexer's constructor.
  • envelopeAttributes, a set of information about the original Envelope that this message was transmitted in. We don't use this argument in this Indexer.
  • message, the analytics message we will be reading from.

The method determines what bucket this application run should fall under, then adds the MemoryBucket field to the temp-bin (with the appropriate bucket name as its value), and finally tells the Workbench that the temp-bin was updated, via the ExtractResult.

Note the use of logging in this method as well. By default, the Computation Service (which will ultimately be calling this method) writes log messages to the Windows Event Log if the logger name begins with "Custom" and is at least at the Info level. Thus, if this Plugin is compiled with a debugging profile, the Workbench will log every time a PerformanceProbe message is extracted by the method.

Step 5: Define Indexer Transforms

Some Indexers may need to correlate extracted data between multiple messages before publishing. The use of Indexer Transforms is a simple way to do this.

However, for this Indexer, we do not need to perform any transforms, so our implementation will just return null.

public override Transform[] DefineTransforms()
{
    return null;
}

Step 6: Define Output Schemas

Finally, we need to define how information from temporary storage is written to permanent storage by using an Output Schema. This determines when the publishing phase of the pipeline is activated.

An Output Schema consists of:

  • Name - must be unique among all Indexers. In our example, it's "SampleSchema".
  • Indexer - a reference to the associated Indexer (i.e., this).
  • PivotKeys - the set of PivotKey fields that will be published.
  • RequiredFields - the set of Data fields that will be published.

An Output Schema is triggered when all RequiredFields and PivotKeys are set in the Indexer's temp-bin.

public override OutputSchema[] DefineOutputSchemas()
{
    return new []
        {
            new OutputSchema("SampleSchema", this)
            {
                PivotKeys = new HashSet<FieldKey>
                            {
                                GetFieldKey(MemoryBucket)
                            },
                RequiredFields = new HashSet<FieldKey>
                            {
                                GetFieldKey(ApplicationRunIndexer.Namespace, ApplicationRunIndexer.ApplicationRunStartCount)
                            }
            }
        };
}

Notice that we can not only reference fields defined by this Indexer (i.e., MemoryBucket) but also fields defined by other Indexers (in this case, ApplicationRunStartCount from the Application Run Indexer).

Additional fields may be added to all Output Schemas via Patterns, and the Workbench also automatically associates publish operations with Application and Date Range.

Step 7: Create a Server Query

Before the Portal, or other clients of the Query API, can retrieve the data generated by this Indexer, it must have access to a Server Query that can provide it. Please see the section on Query creation for details on how to create a custom Server Query.

Step 8: Deploy

After creating your Indexer and Query, deploy the Plugin assembly.

Once data has arrived and been processed by this Indexer, the custom Server Query will be available in the Query API, as indicated by the Metadata Query. If you encounter any issues, please see the Troubleshooting section.

Step 9: Update the Portal to Show or Use the Data

In order to see the data in the Portal, you will need to create or update Portal-side queries, transformations, widgets, and/or reports. Please see the Rendering Configuration documentation for details.



Workbench Version 1.2.0. Copyright © 2016 PreEmptive Solutions, LLC