PreEmptive Analytics Workbench User Guide

Custom Data Filters

A common use-case for building custom Indexers is to extract Custom Data (key-value pairs) from incoming messages. The Workbench contains an enhanced API, called a Custom Data Filter, which makes aggregating on Custom Data very easy. The Custom Data Filter automatically extracts relevant information, generates a Query, and optionally sets up a Pattern.

Example: Query

All Custom Data Filters process incoming messages automatically, and expose this data by a generated Query. This Query indicates how many times a particular combination of values from a specified set of keys was observed on any message.

The following example demonstrates the steps for creating a Custom Data Filter, using the Sample Custom Data Filter (installed as part of the Sample Plugin) as reference.

Query Goal

Suppose we are developing a game of some kind, and it's instrumented to return information about characters when the player logs in, by attaching this information to a Login feature message (and these particular keys are never sent by this application on any other kind of message). Some data might be:

{Character: Mage}, {Level: 13}, {Inventory: <long string>}
{Character: Warrior}, {Level: 11}, {Inventory: <long string>}
{Character: Mage}, {Level: 10}, {Inventory: <long string>}

There are a variety of ways to track this data, depending on how detailed we want to be:

  • To just track just one of these fields (e.g., Character), the default Custom Data Indexer and Queries are sufficient (e.g. look up the Mage Value for the Character Key).
  • To correlate specific scenarios of these fields to information besides these fields (such as Custom Data from other messages), you will need to create an Indexer.

But let's say we want to track all of these values together, so we can answer questions like, "How many people log in as a Mage and are at level 10?".

The default Plugins don't correlate different Custom Data key-value pairs with each other, so that option isn't sufficient. We could make an Indexer with a Query, but the Custom Data Filter is easier, because it automatically performs extraction, publishing, and query creation from a simplified set of options.

Query Step 1: Create

Custom Data Filters subclass the PreEmptive.Workbench.Interfaces.CustomCounting.CustomDataCounterBase class.

The entire definition is contained within the arguments to the base constructor:

  • domain: The Domain used by the generated Query.
  • name: The Name of the generated Query.
  • sourceMessageTypes: The message types to extract.
  • customDataKeys: The Custom Data keys to track, each defined by:
    • name: The String used by the client application as a Custom Data key (e.g., Level).
    • allowLargeKeys: Whether values of this Custom Data may exceed 1024 characters. Defaults to false.
  • fieldKeyFactory: A field-creation utility to the CustomDataCounterBase class. You will not need to interact directly with this argument.
  • applyAsPatternForEntireSession: Whether the keys will be used to qualify the entire Session's data in a Pattern (see below). Defaults to false.
public class SampleCustomDataFilter : CustomDataCounterBase
{
    public SampleCustomDataFilter(FieldKeyFactory fieldKeyFactory)
        : base(
            "PreEmptive.Sample", "VideoGame",
            new Type[] { typeof(FeatureMessage) },
            new CustomDataCounterKey[] {
                new CustomDataCounterKey("Character", false),
                new CustomDataCounterKey("Level", false),
                new CustomDataCounterKey("Inventory", true)
            },
            fieldKeyFactory,
            applyAsPatternForEntireSession:false
        )
    {
    }
}

Note that we specified true for the second parameter of our Inventory field's constructor. This is because this field might contain a String longer than 1024 characters, and thus allow the field to handle larger values.

Query Step 2: Deploy

Next, deploy the Plugin assembly.

The metadata query should now have an additional query listed, whose domain and name match those defined by the Custom Data Filter, with filterable fields corresponding to the defined Custom Data keys, and an additional field, CustomCount, to report the number of occurrences of a particular set of values. Clients of the Query API can aggregate this query based on one or more of the keys, and the API will report the number of instances of each particular set of values for those keys.

Query Step 3: Update Portal

A Portal Query, Widget, and Report for using this Sample Custom Data Filter are located in the Sample folder, under WorkbenchSamplePortalFiles, to be placed in the Portal's local namespace. Remember to update config.json to display the new Report, local:custom-data-filter/video-game.

Query Limitations

There are a few limitations to using a Custom Data Filter:

  • The generated Query aggregates per-message and will count the number of times a particular set of values was seen, without regard to which session or feature it occurred on.
  • The generated Query treats the entire set of defined keys as a unit. If a message contains only a subset of these keys as Custom Data, none of the values will be counted in the Query.
  • Custom Data values with a large number of unique values, such as high-precision decimal values, may not be suitable for Custom Data Filters, as each unique value will be counted separately.
  • Patterns won't be usable on the Indexers and Queries produced.

Example: Pattern

Some applications have a set of keys that occur only once per session. In this case, the Custom Data Filter can be used to filter all session, feature, and message data stored in the Workbench by its set of keys, in addition to the other functions outlined above.

Pattern Goal

Consider the previous example, where we implemented a Custom Data Filter to track certain Custom Data values reported by a game application. Let's add the additional condition that the Login feature message only occurs once per session.

Because our keys are now only reported once during the session, we can logically think of this Custom Data as applying to the entire session - and thus, we want to be able to filter all the Workbench data associated with a session (including child Scopes, like features) by the values of these keys. This allows us to answer questions like, "How many times was Feature X triggered by people playing as a Mage?"

This can be accomplished by adding a Pattern to our previous Custom Data Filter. Now, in addition to the generated Query, all Queries will now have filterable fields corresponding with the defined set of keys, which can be used to filter most Queries by these keys' unique values.

Pattern Step 1: Create

Create the Custom Data Filter the same way as in the Query-only case, but this time, set the applyAsPatternForEntireSession property to true.

public class SampleCustomDataFilter : CustomDataCounterBase
{
    public SampleCustomDataFilter(FieldKeyFactory fieldKeyFactory)
        : base(
            "PreEmptive.Sample", "VideoGame",
            new Type[] { typeof(FeatureMessage) },
            new CustomDataCounterKey[] {
                new CustomDataCounterKey("Character", false),
                new CustomDataCounterKey("Level", false),
                new CustomDataCounterKey("Inventory", true)
            },
            fieldKeyFactory,
            applyAsPatternForEntireSession:true
        )
    {
    }
}

Pattern Step 2: Deploy

Next, deploy the Plugin assembly.

The queries listed by the metadata query should now contain filterable fields corresponding to the defined Custom Data keys. Clients of the Query API can aggregate and filter most queries based on one or more of the keys. Note that the generated Query is still present, as well.

Pattern Step 3: Update Portal

Modify the default_filters property in config.json to allow the Portal to filter its Reports by these new fields:

"default_filters":[
    {
        "label": "Date Range",
        "type": "DateRange",
        "domains": ["All"],
        "fields": ["Time"]
    },{
        "label": "Application",
        "type": "Application",
        "domains": ["All"],
        "fields": ["AppId_Version"]
    },{
        "label": "OS",
        "type": "Many",
        "domains": ["All"],
        "fields": ["OS"],
        "hidden": true
    },{
        "label": "Runtime",
        "type": "Many",
        "domains": ["All"],
        "fields": ["Runtime"],
        "hidden": true
    },{
        "label": "Location",
        "type": "Many",
        "domains": ["All"],
        "fields": ["Country"],
        "hidden": true
    },{
      "label": "Character",
      "type": "Many",
      "domains": ["All"],
      "fields": ["Character"]
    },{
      "label": "Level",
      "type": "Many",
      "domains": ["All"],
      "fields": ["Level"]
    },{
      "label": "Inventory",
      "type": "Many",
      "domains": ["All"],
      "fields": ["Inventory"]
    }
]

Now, the checkboxes on three of the Widgets in the Custom Data Filter Report can be used to filter most of the Portal.

Pattern Limitations

Some limitations exist when using the Filter as a Pattern, in addition to those that come with using it as a Query:

  • The instrumented application should not send more than one message per session that has the specified keys. If this happens, the query results are not well-defined.
  • Do not install a Filter as a Pattern if its keys are a subset of another installed such Filter applied as a Pattern. For instance, in the above example, do not also install a second Filter as a Pattern on just the keys Character and Level. This will cause query results to be inconsistent.
  • In most cases, data specific to application runs will not be filterable. This is to ensure data defined by the Filter applies per-session, rather than applying to all sessions within an application run. The only exception is if the data arrives on an Application Life Cycle message - then the data will apply per-application run and all sessions created by that application run.
  • Using a Filters as Patterns will increase the size of the database. Including other Patterns that augment Indexers, and the built-in OS/Runtime/Location Indexer, we do not recommend installing more than 5 such Patterns. See our storage recommendations for details.
  • The built-in Tamper Query cannot be aggregated or filtered by this pattern, because tamper messages are not guaranteed to occur during a session.
  • The built-in Custom Data Query cannot be aggregated or filtered by this pattern, because it relies on dynamic Output Schemas.


Workbench Version 1.2.0. Copyright © 2016 PreEmptive Solutions, LLC