PreEmptive Analytics Workbench User Guide

Advanced Computation Concepts

Prior sections have covered basic use of the Workbench Plugin components, such as Indexers and Queries. This page, along with the full pipeline description, details the advanced capabilities of the Workbench's data-processing system.

Note: For brevity, this page uses excerpts and pseudocode when discussing specific Plugin source code. The source for the Sample Plugin can be found in the <Installation Folder>\Sample Plugin folder. Source for default Plugins can be obtained from PreEmptive Solutions.

Indexers

At their simplest, Indexers need only know how to extract data from their own types of messages, and how to publish it. At their most advanced, Indexers are collaborative components, able to share data with other Indexers (even of different Scopes), calculate data over multiple messages, and even add fields and Output Schemas to the temp-bin dynamically.

Temp-bin Sharing

Temporary bins, which hold extracted data from a particular application run, session, feature, or message, can be shared among Indexers. In fact, our Sample Indexer does this - it shares the same Application Run-scoped temp-bin with the Application Run Indexer.

As a reminder, here is pseudocode for the Sample Indexer:

Scope: Application Run
Prerequisite Indexers: [ApplicationRunIndexer]
Message Types To Extract: [PerformanceProbe]
Fields:
    MemoryBucket (Pivot Key)
Extract:
    x = the available memory indicated by message
    if x < 1024:
        bucket = "0"
    else if x < 4096
        bucket = "1024"
    else
        bucket = "4096"
    addValueToTempBin(MemoryBucket, bucket)
Indexer Transforms: []
OutputSchemas:
[
    {
        Name = SampleSchema
        RequiredFields = [ApplicationRunStartCount (from ApplicationRunIndexer)],
        PivotKeys = [MemoryBucket] // And others added by Patterns
    }
]

Now let's look at how a few messages will be processed when using both this Indexer and the Application Run Indexer.

  • Message of type ApplicationLifeCycle, with messageGroupID = 1, arrives.
    1. The Computation Service looks up the Indexers that process the message.
    2. The Application Run Indexer is found, and the Scope type ("Application Run") is retrieved.
    3. Since this is the first application run message with messageGroupID = 1, the Computation Service provides an empty bin scoped by this application run.
    4. The Extraction code for the Application Run Indexer stores a value for ApplicationRunStartCount within the bin.
    5. The Computation Service scans the temp-bin and realizes that Output Schema named ApplicationStartInformation has been satisfied for the Application Run Indexer, so the data (ApplicationRunStartCount) and Pivot Keys are published to permanent storage.
  • Message of type PerformanceProbe, with messageGroupID = 1, arrives.
    1. The Computation Service looks up the Indexers that should process the message.
    2. The Sample Indexer is found, and the Scope type ("Application Run") is retrieved.
    3. Since this message has the same messageGroupID (1) as the ApplicationLifeCycle message, the Computation Service provides the same bin.
    4. The bin already contains a value for ApplicationRunStartCount.
    5. The Extraction code for the Sample Indexer stores a value for MemoryBucket within the bin.
    6. The Computation Service scans the temp bin and realizes that Output Schema named SampleSchema has been satisfied for the Sample Indexer, so the data (ApplicationRunStartCount, as extracted by the Application Run Indexer) and Pivot Keys (including MemoryBucket) are published to permanent storage.

So even though the ApplicationRunStartCount field is defined and extracted by the Application Run Indexer, the Sample Indexer can still use it, since it shares the same temp-bin.

Note that sharing temp bins allows for inconsistent data depending on the order messages are received and processed. The AddValue method has an optional boolean parameter called dontSetIfAlreadyThere which, as one might expect, overwrites values of fields if false and uses only the first seen value if true. The default value is true and it is recommended to not change the flag to false because of performance and data consistency concerns. The AllowCopyToChildren flag discussed in the next section, when set to true in conjunction with dontSetIfAlreadyThere to false, allows for scenarios where processing a message will touch all of its children halting message ingestion. Because of these concerns the dontSetIfAlreadyThere flag will be deprecated in a future release.

Temp-bin Hierarchy

Recall that Indexers can also define a hierarchy for temp-bins.

For example, the default Feature Indexer operates on FeatureMessages, and thus uses Feature Scope. Session Scope is the parent scope for the Feature Scope (because the Feature Indexer defines Session Scope as a parent), which itself has Application Run Scope as a parent (because the Session Indexer defines Application Run Scope as a parent).

As a result, information stored in temp-bins with Application Run Scope (such as MemoryBucket, from our Sample Indexer) and with field setting AllowCopyToChildren to true gets copied to the associated temp-bin with Feature Scope. Fields are also copied down automatically if they are used in patterns.

By default, AllowCopyToChildren is false because most information is not relevant outside of its defining indexer. However, it makes sense to set it to true when you wish to access information inside a query without using a delayed action (detailed below). As long as you wish to access a value directly without additional processing at computation time, delayed actions are not required.

Imagine we modified the Feature Indexer to also use MemoryBucket as a Pivot Key (in reality, we would make this happen automatically using a Pattern, but for our example we'll assume we statically added the field to the Feature Indexer). Here is an abbreviated pseudocode for such a modified Indexer:

Scope: Feature
Prerequisite Indexers: []
Message Types To Extract: [FeatureMessage] (FeatureStart, FeatureStop and FeatureTick)
Extract:
    addValueToTempBin(FeatureName, message.FeatureName);
    addValueToTempBin(FeatureCount, 1);
Indexer Transforms:    []
OutputSchemas: 
[
    {
        Name = FeatureSchema, 
        RequiredFields = [FeatureCount]
        PivotKeys = [FeatureName, MemoryBucket (from Sample Indexer)] // And others added by Patterns
    }
]

Then, let's supplement the example from the Temp-bin Sharing section with two new messages:

  • Message of type SessionLifeCycle, with messageGroupID = 1, sessionID = S1, arrives.
    1. The Computation Service looks up the Indexers that should process the message.
    2. The Session Indexer is found and the scope ("Session") is retrieved.
    3. Since this is the first message with sessionID = S1, the Computation Service provides an empty Session bin scoped by sessionID.
    4. Since this message has the same messageGroupID (1) as the messages from the previous section, the Computation Service retrieves that Application Run bin as a parent.
    5. MemoryBucket is copied from the parent bin to the child bin. If anything changes in the Application Run (parent) bin, the Session (child) bin will also be updated appropriately.
    6. The Session Indexer continues its extraction, but the details aren't relevant for this example.
  • Message of type FeatureTick, with messageGroupID = 1, sessionID = S1, and featureID = F1, arrives.
    1. The Computation Service looks up the Indexers that should process the message.
    2. The Feature Indexer is found and the scope ("Feature") is retrieved.
    3. Since this is the first message with featureID = F1, the Computation Service provides an empty Feature bin scoped by FeatureID.
    4. Since this message has the same sessionID (S1) as the previous message, the Computation Service retrieves the same Session bin as a parent.
    5. MemoryBucket is again copied from the parent bin to the child bin. If anything changes in the Application Run (grandparent) bin, the Session (parent) and Feature (child) bins will also be updated appropriately.
    6. The Extraction code for the Feature Indexer stores values for FeatureName and FeatureCount within the Feature temp-bin.
    7. The Computation Service scans the temp bin and realizes the Output Schema named FeatureSchema is satisfied, so the data (FeatureCount) and Pivot Keys (including FeatureName and the inherited MemoryBucket) are published to permanent storage.

Note that fields declared with AllowCopyToChildren as false will not be subject to this sharing behavior. For instance, both the Session Indexer and Feature Indexer use the field StartTime in the Standard namespace, since they both call the helper method ExtractTime. However, this field is declared to not allow copying-to-children, so the start timestamp of a session will not accidentally be counted as the start timestamp of a child feature.

Indexer Transforms

In most cases, data extracted from messages (using the Extract method) will be sufficient for publishing to permanent storage directly. However, in some cases, you may want to have fields that are calculated from extracted information, even over multiple messages.

For instance, in order to calculate the length of a session, the default Session Indexer must have the timestamps of both the Session Start and Session Stop messages, and messages are not guaranteed to arrive over the network in any order.

The Indexer API provides a way to modify the temp-bin when certain fields exist in the bin. These actions are called Indexer Transforms. Each Transform specifies what fields are required before it runs, and the actions the Workbench should perform as a result. The Workbench then carries out these steps when the given fields are available.

For example, this excerpt of the Session Indexer requires the start and stop time of the session before calculating the difference and writing that as the session length:

public override Transform[] DefineTransforms()
{
    return new[]
    {
        new Transform("LengthTransform")
        {
            RequiredFields = new []
            {
                GetFieldKey(StandardFields.Namespace, StandardFields.StartTime), 
                GetFieldKey(StandardFields.Namespace, StandardFields.StopTime)
            },
            TransformActions = fields =>
            {
                TimeSpan sessionLength = 
                    fields.GetEntry<DateTime>
                        (StandardFields.Namespace, StandardFields.StopTime)
                    .Subtract(fields.GetEntry<DateTime>
                        (StandardFields.Namespace, StandardFields.StartTime));
                if (sessionLength.TotalSeconds < 0)
                {
                    return new TransformAction[]{};
                }
                return new[]
                {
                    new TransformAction
                    {
                        Key = GetFieldKey(SessionLength),
                        Value = sessionLength
                    },
                    new TransformAction
                    {
                        Key = GetFieldKey(MaxSessionLength),
                        Value = sessionLength
                    },
                    new TransformAction
                    {
                        Key = GetFieldKey(MinSessionLength),
                        Value = sessionLength
                    }
                };
            }
        }
    };
}

Here, the Indexer modifies three fields when both Session Start and Stop messages are received, all related to the session length (the only difference is their Merge Option when they are defined). We can also see that if the difference turns out to be negative (and thus invalid), then we can return an empty array of TransformAction and not modify the temp-bin at all.

Delayed Actions

Messages may be delivered to the Workbench in any order. In the Indexer Transforms section, we discussed how to deal with a Session Stop arriving before a Session Start message. However, in that example, the messages were both extracted by the same Indexer, and in the same Scope.

It is also possible that a message ordering constraint applies across multiple Indexers. For instance, the Feature Indexer tracks which Features have been observed and shares this information across the entire Session. When a Feature message is extracted, it may have arrived prior to any associated Session messages, meaning the Session temp-bin will not have been created yet. Because of this, we need a way for the Feature Indexer to execute code after the Session bin has been created.

The Indexer API provides Delayed Actions to address this case. Delayed Actions are registered following an Indexer's Extract method, and are called back after the parent temp-bin of the Indexer's temp-bin has been created. These delegates must match the signature:

static void ActionName(IStateBin childTempBin, IStateBin parentTempBin, FieldKeyFactory factory)

Warning: Delayed Actions should only be used to add data to a temp-bin, or to update data that was previously added by an earlier Delayed Action. They should not be used to retrieve data from a temp bin; when that is necessary, the AllowCopyToChildren mechanism is preferable. This is because Delayed Actions run once when the temp-bin is created, rather than once for each time they are updated by an indexer. Different indexers will create temp bins with different initial data, depending both on the indexer itself and on the order in which the data arrives. Because of this, the data that needs to be retrieved might not be there, and the Delayed Action won't get another chance to retrieve it.

In our example, the Feature Indexer's Extract method registers a Delayed Action; then, once the necessary Session temp-bin has been created, the Delayed Action can modify both the Session temp-bin and the Feature temp-bin. Here's an excerpt from the Feature Indexer:

public override ExtractResult Extract
    (IStateBin tempStateBin, EnvelopeAttributes envelopeAttributes, Message message)
{
    // ...omitted...
    if (EventCodes.FeatureStart == featureMessage.Event.Code)
    {
        tempStateBin.AddValue(GetFieldKey(FeatureStartCount), 1);
        return new ExtractResult(true)
        {
            DelayedAction = GetSessionAction
        };
    }
    // ...omitted...
}
private static void GetSessionAction
    (IStateBin featureTempStateBin, IStateBin sessionTempStateBin, FieldKeyFactory fieldKeyFactory)
{
    var featureName = 
        featureTempStateBin.GetValue(fieldKeyFactory.GetFieldKey(Namespace, FeatureName)) as string;
    var featureSeenInSession = 
        sessionTempStateBin.GetValue(fieldKeyFactory.GetDynamicFieldKey
            (Namespace, FeatureSeenInSession, featureName));
    if (featureSeenInSession == null)
    {
        featureTempStateBin.AddValue
            (fieldKeyFactory.GetFieldKey(Namespace, SessionCount), 1);
        sessionTempStateBin.AddValue
            (fieldKeyFactory.GetDynamicFieldKey(Namespace, FeatureSeenInSession, featureName), true);
    }
}

Let's look at how out-of-order messages will be extracted in this case:

  • Message of type FeatureTick, with messageGroupID = 1, sessionID = S1 and featureID = F1, arrives.
    1. The Computation Service looks up the Indexers that should process the messages.
    2. The Feature Indexer is found and the scope ("Feature") is retrieved.
    3. Since this is the first message with sessionID = S1 and featureID = F1, the Computation Service provides an empty Feature bin scoped by sessionID and featureID.
    4. The Extraction code for the Feature Indexer stores a value for FeatureStartCount within the Feature temp-bin.
    5. Since the Extraction code specified a Delayed Action, it is registered on that temp-bin.
  • Message of type SessionStart, with messageGroupID = 1, sessionID = S1, arrives.
    1. The Computation Service looks up the Indexers that process the message.
    2. The Session Indexer is found, and the Scope ("Session") is retrieved.
    3. Since this is the first Session scope with sessionID = S1, the Computation Service provides an empty bin scoped by sessionID.
    4. The Computation Service recognizes that this Session temp-bin has the same sessionID as the previous Feature temp-bin, so the Session temp-bin becomes the parent of the Feature temp-bin, and the Delayed Action is run.
    5. The Delayed Action stores values in both the Session temp-bin and the Feature temp-bin.
    6. The Extraction code for the Session Indexer stores values within the Session temp-bin.

Dynamic Output Schemas

It is possible to define an Indexer's Output Schemas at extraction time, rather than before compiling the Plugin. This is necessary if you do not know ahead of time what sort or amount of information the Indexer will need to publish.

Note: Patterns are not applied to Dynamic Output Schemas. Please see that section.

For example, any message may contain Custom Data (in the form of key-value pairs). A game application might send something like {"CharacterClass", "Mage"}. The default Custom Data Indexer, which operates in the Message Scope, is responsible for counting the number of times any particular pair occurs (along with indexing these by the message's source event). The Indexer needs to publish each pair separately.

If any message could only contain one such pair, the Output Schema could be defined normally. However, because any particular message may contain an arbitrary number of key-value pairs, we do not know, when designing the Indexer, how many times we need to publish. Therefore, we cannot define any Output Schemas ahead of time:

public override OutputSchema[] DefineOutputSchemas()
{
    return null;
}

Instead, when we actually receive a message with custom data, we dynamically create both temp-bin fields and Output Schemas at runtime. Here is an excerpt from the Custom Data Indexer's Extract method:

public override ExtractResult Extract(IStateBin tempStateBin, EnvelopeAttributes envelopeAttributes, Message message)
    {
        if (message.ExtendedInformation.Any())
        {
            var resolvedSource = String.Empty;
            // ... omitted ...
            foreach (var extendedKey in message.ExtendedInformation)
            {
                var eventCodeDynamicKey = GetDynamicFieldKey(EventCode, extendedKey.Key);
                var sourceDynamicKey = GetDynamicFieldKey(Source, extendedKey.Key);
                var keyDynamicKey = GetDynamicFieldKey(Key, extendedKey.Key);
                var countDynamicKey = GetDynamicFieldKey(Count, extendedKey.Key);    
                switch (extendedKey.DataType)
                {
                    case "string":
                        //create dynamic descriptions
                        var stringValueDynamicKey = GetDynamicFieldKey(StringValue, extendedKey.Key);
                        //put in temp bin entries 
                        tempStateBin.AddValue(eventCodeDynamicKey, message.Event.Code);
                        tempStateBin.AddValue(sourceDynamicKey, resolvedSource);
                        tempStateBin.AddValue(keyDynamicKey, extendedKey.Key);
                        tempStateBin.AddValue(stringValueDynamicKey, extendedKey.Value);
                        tempStateBin.AddValue(countDynamicKey, 1);
                        //create dynamic output schemas
                        tempStateBin.AddOutputSchema(
                            new OutputSchema(extendedKey.Key, this)
                            {
                                RequiredFields = new HashSet<FieldKey>
                                                 {
                                                    countDynamicKey  
                                                 },
                                PivotKeys = new HashSet<FieldKey>
                                            {
                                                eventCodeDynamicKey,
                                                sourceDynamicKey,
                                                keyDynamicKey,
                                                stringValueDynamicKey
                                            } 
                            });
                        break;
                    // ... omitted ...
                }
             }
            return new ExtractResult(true);
        }
        return new ExtractResult(false);
    }

For each key-value pair (extendedKey), we dynamically create new fields in the temp-bin (but not the permanent store) by calling GetDynamicFieldKey. A dynamic key copies the form of a statically-defined field (in terms of type, Merge Option, etc.) but has an additional distinguishing instance name, allowing for there to be multiple instances of the same Field in a temp-bin. For instance, GetDynamicFieldKey(Count, extendedKey.Key) creates a field in the temp-bin with the name, type, and Merge Option of the Custom Indexer's Count field, with the instance name equal to the key-value pair's key.

Then, once we have determined the type of the pair's value, we define one more dynamic field, and fill all of these fields with values appropriately. Finally, we add an Output Schema to the temp-bin directly, using the AddOutputSchema method, to ensure these dynamically-defined fields are written to permanent storage.

However, when publishing to permanent storage, we want the values to be aggregated by the base field, rather than the dynamic field. For instance, we want the permanent store to contain just one column for all Custom Data keys (Key) rather than a column for each dynamic key (Key-CharacterClass, Key-Inventory, etc). To make this translation, an additional type must be defined in the Plugin: a Field Key Mapper. Field Key Mappers implement the AggregationInterfaces.Schema.IFieldKeyMapper interface. Consider the Custom Data Indexer Plugin's CustomDataMapper:

using AggregationInterfaces.Schema;
namespace PreEmptive.Analytics.Workbench.Plugins.CustomData
{
    /// <summary>
    /// Maps field keys to their non-dynamic counterparts, if applicable.
    /// </summary>
    public class CustomDataMapper : IFieldKeyMapper
    {
        private readonly FieldKeyFactory _fieldKeyFactory;
        public CustomDataMapper(FieldKeyFactory fieldKeyFactory)
        {
            _fieldKeyFactory = fieldKeyFactory;
        }
        /// <returns>Whether this mapper applies to this field key</returns>
        public bool CanMap(FieldKey fieldKey)
        {
            // true if this field key is defined by the CustomDataIndexer
            return fieldKey.Namespace == CustomDataIndexer.Namespace;
        }
        /// <returns>How this field key should be transformed in temporary storage</returns>
        public TempBinMap MapTempBinFieldKey(FieldKey fieldKey)
        {
            // Map this key to itself (we want dynamic keys to stay dynamic in temp-bins)
            return new TempBinMap(_fieldKeyFactory.GetFieldKey(fieldKey.Namespace, fieldKey.Name), 0);
        }
        /// <returns>The field key that should be used in permanent storage for this field key</returns>
        public FieldKey MapAggregationBinFieldKey(FieldKey fieldKey)
        {
            // Map the key to the static version of itself
            // by setting "includeDynamic" false.
            return _fieldKeyFactory.GetFieldKey(
                fieldKeyNamespace: fieldKey.Namespace,
                name: fieldKey.Name,
                includeDynamic: false);
        }
    }
}

The relevant method for this discussion is the MapAggregationBinFieldKey, which, given a temp-bin field descriptor, determines what permanent store field should be used for the publish operation. If you are using dynamic keys and output schemas in your Indexer, you will need to define a similar IFieldKeyMapper implementation in your Plugin, adjusting for the different namespace(s) appropriately.

Let's walk through the entire process with a sample message.

  • Message with Custom Data [ {"CharacterClass", "Mage"}, {"Inventory", "1x Pickaxe"} ] arrives.
    1. The Computation Service looks up the Indexers that process the message.
    2. The Custom Data Indexer is found, and the Scope ("Message") is retrieved.
    3. The Computation Service provides an empty bin scoped by messageID.
    4. The extraction code from the Custom Data Indexer:
      1. Observes the CharacterClass key.
      2. Dynamically creates fields in the temp-bin, including one based off the Key field, specifically for CharacterClass. Let's call this dynamic field Key-CharacterClass.
      3. Determines that "Mage" is a String-type value.
      4. Dynamically creates a field in the temp-bin, based off the Value field, specifically for CharacterClass. Let's call this dynamic field Value-CharacterClass.
      5. Fills these newly-defined fields: Key-CharacterClass is assigned "CharacterClass", and Value-CharacterClass is assigned "Mage".
      6. Dynamically creates and fills additional field keys appropriately.
      7. Dynamically defines an Output Schema for these fields, with the name CharacterClass.
      8. Repeats this process for the Inventory key and its value.
    5. The Computation Service recognizes that the CharacterClass Output Schema (that we just defined!) has been satisfied.
      1. Key-CharacterClass, Value-CharacterClass, and the other dynamic fields defined in the Output Schema are selected for publishing.
      2. The Computation Service looks up any relevant Field Key Mappers and finds the CustomDataMapper, which maps Key-CharacterClass to Key, Value-CharacterClass to Value, etc.
      3. The permanent storage receives published data in the form {"Key": "CharacterClass", "Value": "Mage", ...}.
    6. Another publish operation occurs for the Inventory Output Schema.
      1. Key-Inventory, Value-Inventory, and the other dynamic fields defined in the Output Schema are selected for publishing.
      2. The Computation Service looks up any relevant Field Key Mappers and finds the CustomDataMapper, which maps Key-Inventory to Key, Value-Inventory to Value, etc.
      3. The permanent storage receives published data in the form {"Key": "Inventory", "Value": "1x Pickaxe", ...}.

Queries

Queries provide an abstraction layer between the database and clients of the Query API. Because of this, Queries have many opportunities, during the Query Phase, to transform values or even create new fields in the returned data, without altering the database.

Query Transforms

The values of fields in the database may be optimized for indexing and aggregation, but the end results might not be the format we want to present to clients of the Query API. In these cases, we apply a Query Transform.

For instance, if we want to know how many unique users began sessions within a certain time frame, we need to store sets of hashed user identities within the database. But when it comes time to query this data, we don't want the set of hashes - we just want the number of unique users.

This case is handled by the Key Stats Query, by defining a simple delegate along with the field:

new FieldMetadata
{
    AssociatedFieldKey = _fieldKeyFactory.GetFieldKey
        (UserIndexer.Namespace, UserIndexer.UniqueUsersCounter),
    FieldName = "UniqueUsers",
    PostAggregationValueTransform = (md, counter) => (counter as HyperLogLog).Cardinality(),
    DataType = typeof(int),
    FriendlyName = "Unique Users"
}

Here, the value retrieved from the database is a HyperLogLog object, which is used within the database to approximate unique users. The output of this field should really be the cardinality of this database object, so we specify a PostAggregationValueTransform delegate to transform this field's value appropriately.

There are two times in the pipeline that a Query can apply a Transform:

In this case, we choose to apply our transform after aggregation, rather than before. To understand why, consider a request by the Query API to get the unique users, grouped by day. The data retrieved from the database looks like this:

Time UniqueUsers
2014-06-12 10:00 {A, B}
2014-06-12 13:00 {C, B}

Because our results ask for data grouped by day, these two rows will be merged during the aggregation process.

  • If we apply the transform before aggregation, these HyperLogLog objects will each be converted to the integer 2. Then, when we aggregate the rows, our result will be 4, which is wrong: there's only 3 unique users present.
  • If we apply the transform after aggregation, the aggregation will instead merge the actual HyperLogLog objects into {A, B, C}. Then, our transform will apply, converting the object to the integer 3.

Computed Fields

In addition to defining database-backed fields, Query Metadata may also define Computed Fields, whose values are determined at query-time based on the values of other fields.

public QueryMetadata QueryMetaData
{
    get
    {
        return new QueryMetadata
        {
            Name = "KeyStats",
            ComputedFields = new List<ComputedFieldMetadata>
            {       
                new ComputedFieldMetadata
                {
                    DataType = typeof(int),
                    FieldName = "ReturningUsers",
                    FriendlyName = "Returning Users",
                    Fields = new []{"NewUsers", "UniqueUsers"},
                    PostTransformComputation = dict => 
                    {
                        return (long)dict["UniqueUsers"] - (long)dict["NewUsers"];
                    }
                },
                // ... other computed fields omitted ...
            }
            // ... database-backed fields omitted ...
        }
    }
}

Here, we see a Computed Field, ReturningUsers, which relies on the NewUsers and UniqueUsers fields (both of which are database-backed). The actual computation is specified in PostTransformComputation, so that ReturningUsers = UniqueUsers - NewUsers.

There are two places in the pipeline that Computed Fields may be added to the results:

  • Post-Aggregation, which creates fields after the rows are merged but before Post-Aggregation Transforms, and
  • Post-Transform, which creates fields after all Transforms.

Note that Computed Fields cannot take Pivot Keys as input, only Data fields. However, Pivot Keys can be transformed, which in some cases might produce a similar effect. See this example.

Filterable Fields

It is sometimes convenient to provide clients of the Query API the ability to filter their results based on the value of certain field(s). The most obvious cases are Application and Date Range - e.g., to enable users of the Portal to only view data about a certain application, during a certain week. Other fields, like Operating System, can also be filterable. Filters are discussed in more detail as part of the Query API.

In order for a Query field to be filterable, the database field it references must be a Pivot Key. To see how a filter itself is declared, let's look at the OS Runtime Location Pattern's Runtime property (which is used in the ExtendQuery method):

private FieldMetadata Runtime
{
    get
    {
        return new FieldMetadata
        {
            AssociatedFieldKey = _fieldKeyFactory.GetFieldKey
                (RuntimeIndexer.Namespace, RuntimeIndexer.RuntimeInformation),
            FieldName = "Runtime",
            FriendlyName = "Runtime",
            DataType = typeof(string),
            LinkedFields = new[] { "OS", "Country", },
            Filters = new List<FilterMetadata>
            {
                 new FilterMetadata
                {
                    Type = FilterType.Many.ToString(),
                    Filter = new PickFilter(new object[0])
                }
            }
        };
    }
}

There are a few additional properties we define for this FieldMetadata:

  • LinkedFields: see below.
  • Filters: the actual filter definitions. Each FilterMetadata object defines:
    • Type: what type of filter to use. In this case, Many is used to allow filtering by many different Runtime values.
    • Filter: the filter implementation (an object that implements AggregationInterfaces.Querying.Filters.IQueryFieldFilter). In this case, PickFilter is used.

The LinkedFields property deserves some additional explanation: it is used to locate which permanent storage table the Query Web Service should provide to the pipeline. Because the OS Runtime Location Pattern always publishes three fields together, any table that contains the field for Runtime must also necessarily contain OS and Country. By specifying them as Linked Fields, the Query Web Service knows to look up a table that has all three, even if only one of the fields is specified by the request; otherwise, the Query Web Service will not be able to find a table, and return no results.



Workbench Version 1.2.0. Copyright © 2016 PreEmptive Solutions, LLC