SCHNEL! Or why patience is a virtue except when testing on Windows Phone

August 20th, 2011 by Sebastian Holst

Mystery solved! As I had promised in my last blog entry, I added exception reporting to my two apps, A Pose for That and Yoga-pedia to determine exactly what was going on with the exceptions that the Microsoft Marketplace was reporting but I had never seen. I needed to know:

  1. The cause of the exceptions (the stack traces were too cryptic for me to figure out)
  2. How to fix the problem(s)
  3. Solve the mystery as to why I have never seen a crash even though there is little doubt that they are indeed happening out there in the wild. If I can’t be confident that my testing is complete, I can never be confident that my app will behave when in matters most – in production.

Remember these three objectives because you will be tested later.

Now, I know that my entries are sometimes kind of long – so here are the conclusions…. And if you want to know how I back them up – then (hopefully) you will enjoy the rest of the post.

(PLUS, there’s a teaser at the very end).

Conclusions:

  1. Always account for “loss of context” in WP7 apps – probably the try-catch is the best approach but I will defer to “real developers” for the specific strategy. At least with Silverlight, impatient users can always force your app into an invalidOperationException.
  2. Culture matters in both user preferences and user expectations (and therefore user satisfaction). If at all possible, represent all relevant cultures in your test populations. How do you know what the relevant populations are? Analytics of course…
  3. Software quality, user experience and user profile are all intimately connected. Systems that only monitor user behavior (marketing) or only profile software stability (debugging) or only profile runtime configurations (marketplaces) are inherently weaker than an approach that accounts for the influence that each has on the others.
  4. Without Runtime Intelligence (or another comparable application analytics solution), no development team can be confident in either the quality of their app or their users’ experience.

And here’s the how and why I have come to these conclusions…


HOW – first I had to add my own exception reporting.

Exception Reporting: Adding exception reporting with Runtime Intelligence is very simple. All I had to do was add one exception reporting attribute as follows (from within Dotfuscator for Windows Phone)


Note that in the properties of this attribute I am asking that the method ExceptionExtendedData method be run. A Runtime Intelligence system probe attribute works fine during normal operations, but if I want custom data after an unhandled exception, this is a more reliable technique. Here is the method that I put in the App class:



As a side note, if I wanted to track thrown exceptions or handled, I could place the exception attribute down at the method level to get much more targeted data. Anyhow, after this simple step, I deployed the re-instrumented app to the marketplace and (sadly) watched the exceptions roll in…


Runtime Intelligence Exception reporting

Logging into my Runtime Intelligence portal account and selecting the date range I was interested in and then selecting “Exceptions”, presents me with the following:



I can see the total exceptions over time; the type of exceptions (I am only getting one – and that seems like it might be good news) and I have a list of all of the specific exceptions on the right. Clicking on any one of these shows me the detail as follows:

The graphic above shows screen captures from three different stack traces.

Good news item 1 is that (unlike the marketplace stack traces), I can see the diagnostic message. This may not mean much to the serious developers who enjoy offsets and cryptic traces – but I need these to go back to MSDN and other resources to see what is really going on and what I can do about them.

It turns out that there were seven different exceptions coming from my app – BUT ALL OF THEM HAD TO DO WITH TIMING – not some error in my general logic (in other words, I’m not dividing by zero or trying to display a non-existent image, etc.). For some reason my app is getting vertigo in my customers’ hands and losing track of what page was current resulting in any number of “InvalidOperationException.”

Good new item 2 is that there is a pretty standard way to manage this behavior; the try-catch statement. I’m in no position to explain how this works, but visit the link above for a great explanation.

So with basic Runtime Intelligence exception reporting I have addressed my first two requirements; to diagnose my app’s problem and identify a fix. BUT – I have not addressed the deeper and perhaps more troubling issue of why I have never seen this problem myself – what’s this all about? If I can’t improve my quality control, I can never feel comfortable that my app will perform in the wild as it does for me.

Good new item 3 is that I have Runtime Intelligence to give me EVEN MORE context on my app and my users. The fundamental flaw in almost every exception handling solution I have ever seen is that they (by necessity) can only look at the app when exceptions occur – they are too heavy-weight and/or too invasive to run all the time everywhere – no so with Runtime Intelligence.

If you ONLY have exception data, you are robbed of one of the most effective diagnostic heuristics available – the process of comparing populations in order to identify material differences between them and thus leading to a likely root cause. This is the fastest and cheapest way to figure out why I had never seen a crash.

What I did next was to compare the set of users who experienced exceptions with the general population of users and myself – was there something specific about their phones? Their software? Their behavior?

It turns out that the answers to these three questions are no, no and YES!

Process of elimination: First, I compared the system data of exception users and phone with the general population as defined in ExceptionExtendedData defined above… I won’t bore you with all of the metrics I was able to eliminate, but I will show one; manufacturer.



The two pie charts show the relative percentages of manufacturers in the general population of my users with the population that had exceptions – one can eyeball these and pretty quickly see that there is virtually no difference. The bar chart puts a fine point on this by showing the relative difference in share; Dell had only 1% of the total share and was not statistically significant – looking at the other three manufacturers, we can see that there is no more than a 20% variance between the two populations. This kind of range was consistent across all of the metrics I had been collecting except one.

Schnel!

In my last blog I had noted that there had appeared to be a disproportionate percentage of German speaking users in the exception population and it turns out that this was not a random blip – it showed up again in this latest exception data as follows:

The top bar chart shows the relative percentage of users by culture that experienced exceptions alongside the relative percentage of that culture in the general population. The second bar chart shows the relative difference in share by culture and it is truly surprising (at least to me).

Germans crashed my app 13X more often than norm, Austrians and the Dutch crashed the apps 4X what their relative share would suggest with the Malaysians right behind.

Given the relative distance between these populations and the different carriers and jurisdictions that these populations live under, it seems pretty clear that what these users have in common is their behavior. These users are simply more impatient than the rest of my users. They hit the “show pose” or “take me to the marketplace” or whatever more quickly and more often and so they are that much more likely to cause my app to lose its place.

Not only am I more patient (being an American and at one with the universe ;), but because I know my app and the areas where it may take a beat (or two) to respond – I naturally did not repeat my commands impatiently at those critical times – and therefore, I did not crash my app! Mysteries solved!

Conclusions: (AGAIN)

  1. Always account for “loss of context” in WP7 apps – probably the try-catch is the best approach but I will defer to “real developers” for the specific strategy. At least with Silverlight, impatient users can always force your app into an invalidOperationException.
  2. Culture matters in both user preferences and user expectations (and therefore user satisfaction). If at all possible, represent all relevant cultures in your test populations. How do you know what the relevant populations are? Analytics of course…
  3. Software quality, user experience and user profile are all intimately connected. Systems that only monitor user behavior (marketing) or only profile software stability (debugging) or only profile runtime configurations (marketplaces) are inherently weaker than an approach that accounts for the influence that each has on the others.
  4. Without Runtime Intelligence (or another comparable application analytics solution), no development team can be confident in either the quality of their app or their users’ experience.

TEASER – WOULDN’T BE AWESOME IF WE COULD DO ALL OF THIS PROFILING AND EXCEPTION ANALYSIS WITH HTML5/JAVASCRIPT TOO? STAY TUNED (IN)!

Runtime Intelligence Portal 2.0 now live for commercial users

July 22nd, 2011 by Brandon Siegel

The next version of PreEmptive’s Runtime Intelligence portal was rolled out for commercial Runtime Intelligence users Friday afternoon. The underlying infrastructure has been rebuilt from the ground up to provide better performance, reliability, and security. More details of what’s new in Runtime Intelligence Portal 2.0 are available in the original announcement blog entry.

As always, we deeply value your feedback. You can directly reach the Runtime Intelligence portal dev team by leaving a comment on this blog entry, posting a message on the Commercial RI Portal forum, or tweeting us @PreEmptive.

The new WP7 App Hub reporting is great – and it’s even better with analytics!

July 19th, 2011 by Sebastian Holst

Warning – this is a cliff hanger post. If you don’t like mysteries, come back in two weeks…

Like anyone else who has an app inside the WP7 App Marketplace, I noticed that the App Hub was down most of yesterday with the promise of a functional upgrade in the works – and today I was very pleasantly surprised to see the result; a streamlined experience with expanded capabilities.

One of the first things that caught my attention was the exception reporting by app and by date; very useful indeed. Of course, MSFT is quick to point out that (and I quote) “Crash count alone isn’t a direct measure of app quality. Popular apps may have higher crash counts due to higher usage.

Well that seems self-evident, but without usage metrics how can I evaluate the severity of my exception report counts? …. (and now, unless this is the first post of mine that you have ever read, you must know what’s coming).

To the cloud! (Sorry, I couldn’t resist). Using Runtime Intelligence for Windows Phone, I’m able to measure total sessions – by extracting these counts by day and mashing it up with exception counts from the marketplace – I can now supply the missing ingredient to make the exception count on the App Hub meaningful. (NOTE – I had to manually transcribe exception counts from the App Hub as there is no tabular option and the detailed download drops the daily count as it de-dupes the exceptions).

The App Hub is careful to point out that only apps running NODO (or Mango) can report exceptions, so I first had to remove the Runtime Intelligence session data coming from earlier versions of WP7 (an interesting statistic on its own).

Here is what I see… (and a warning here – the numbers aren’t pretty)

I took two apps of mine; Yoga-pedia and A Pose for That and looked at their respective usage on NODO+ phones via Runtime Intelligence and exception reports from the App Hub and then calculated the ratio of sessions to exceptions.

The time period I used for this test was the two weeks from June 12 to June 25. During that time, this is what I observed:

  • 66% of A Pose for That sessions were run on NODO.
  • 58% of Yoga-pedia sessions were run on NODO.

Here is the ratio of exceptions reported by MSFT and sessions from Runtime Intelligence… (click to enlarge)

Ratio of session counts and exception counts by day

Now there are three likely scenarios here.

  1. Over this two week period, both apps were crashing every 1 in 10 times they were run (HORRIBLE). I don’t think this is the case because I have run these apps myself on multiple phones hundreds of times and they have NEVER crashed.
  2. The App Hub is over-reporting exceptions (or somehow incorrectly associating exceptions with these apps). This is a beta feature on the App Hub – it’s certainly possible.
  3. Runtime Intelligence is way under-reporting the total number of sessions in a given day. Certainly possible, but given the unit testing I have done, I don’t see this as being a major contributing factor to these ratios – but certainly a possibility.


Now, I had already put a “feature tick” on the default unhandled exception handler to count how many times it was invoked during this same period. The counts I have are well below the App Hub numbers (which might suggest number 2 above is the culprit – BUT NOT SO FAST). It is more than likely that certain exceptions (perhaps a majority) would interrupt the normal feature tracking transmission mechanism so I would expect that count from Runtime Intelligence to be artificially LOW.

As is often the case when managing an application “in the wild”, an unanticipated question has arisen and I find that I don’t have enough data. That’s why its ALWAYS so important to

  • plan in advance what data is worth collecting to minimize the likelihood that you will end up in this situation and
  • be sure that your analytic solution supports rapid and easy iterations and refinements to compensate for when your planning falls short.


So how am I going to determine if

  1. my apps offer a LOUSY customer experience everywhere except for my personal phones or
  2. one or both exception reporting counts and session tracking counts are flawed?

Easy - I’m going to post an update of my apps to the marketplace this weekend with Runtime Intelligence Exception reporting turned on. What?

Runtime Intelligence for Windows Phone includes its own exception tracking capabilities – it does require that the developer activate it (that’s why I don’t have that data now), but it offers a lot more data and it can be invoked for unhandled, handled, and thrown exceptions. Further, it can be configured to collect additional information (custom for the app), AND it can be extended to offer the user a dialogue to provide additional feedback if they like.

I will post my results over the next few weeks – meanwhile, if anyone has any suggestions or ideas – please let me know… I honestly have no idea how this little mystery will play itself out.

Before I sign off – here is one more tantalizing clue (although it may also be a red herring). When I look at the limited unhandled exception data currently being returned by Runtime Intelligence (I can see tower location, device manufacturer, OS, etc.), I see that well over 50% of the phones that had an exception were localized to a language OTHER THAN en-US – and that is way out of proportion to the actual usage trends that I have been tracking (and posted in earlier entries). Further, the localizations that had the greatest “disproportionate” number of unhandled exceptions were de-DE and de-AT. Coincidence? Conspiracy? We don’t need to guess – we will soon have the facts!

PS here are two links that may be of interest:


Enjoy!

A Webinar on Monetizing Mobile Apps with Analytics

June 25th, 2011 by Sebastian Holst
For those who want a little more detail on the specific coding steps as well as an update on the latest application of analytics to mobile app development, we’ve scheduled a webinar. Here’s the info….. (the first time slot of 100 filled up in a few hours - so these are additional dates. We’ll keep scheduling these as long as interest is there) Cheers.

Title: Monetize Mobile Apps with Analytics
Registration: Thurs, Aug 4th at 11:30 EDT https://www3.gotomeeting.com/register/676857398
Registration: Wed, Aug 17th at 12:30 EDT https://www3.gotomeeting.com/register/433412582

Description: In this 60 minute webinar, we will take a live WP7 app and use real-world analytics to illustrate:

  1. The impact of try/buy scenarios on paid apps
  2. The relationship between free and paid versions of an app
  3. Strategies for ad-driven app design that consider page location, first time, occasional, and power user patterns, cultural trends, and other demographics including carrier and model profiling.

Preparation: NONE required. However, attendees are likely to get more from the presentation if they have already:
• Installed and are familiar with the Microsoft Windows Phone 7 development tools
• Installed and have some familiarity with PreEmptive Solutions Runtime Intelligence
• Installed and navigated around the free SKU of the sample application that will be referenced in the presentation. The free app is Yoga-pedia.

Improving Ad performance: Correlating ad activity with feature usage and user behavior

June 24th, 2011 by Sebastian Holst

In this third installment on application analytics patterns and practices I’m going to focus on how Runtime Intelligence can be used to shed light on Ad activity within the context of one or more applications. While the use cases covered here are nowhere near exhaustive, I’m going to show how to answer the following questions (and hopefully give some indication as to why you may care about the answers):

  • What are ad impression volumes across multiple apps?
  • What are the click-through rates (the ratio of users clicking on ads to the volume of impressions) across various pivots?
  • What influence does culture (country of origin) have on click-through rates, e.g. are Germans more or less likely to click on ads versus Italians?
  • What carriers/ISP providers are giving me the most business, e.g. where are my users most likely to be found?
  • Where are users spending most of their time inside an app? Does that usage pattern correlate with a user’s likelihood of clicking on an ad?
  • Do returning users interact with ads differently than first time users or power users?
Many of these metrics are valuable in scenarios other than ad effectiveness of course (knowing where users spend their time and understanding how power users behave are two obvious examples), but for this installment, I am going to focus exclusively on how Ad interaction can be viewed across these metrics.

Implementation
I’m using the same trusty one line method WhatPoseWhen that I described in the first installment – this time, I call the method on the New Ad event (to count impressions) and the Ad Engaged event (to count clicks on ads). I could just as easily collect data on any other ad-related event and grab any data that is available to the program at that point in its execution as well. Here is the code for that method in its entirety:

private void WhatPoseWhen (string page, string selection)
{ return; }

The first parameter tells me from which page the method is being called and the second parameter tells me why I might care, e.g. was a new ad displayed, etc.)

I pass the page name and the ad event into WhatPoseWhen and Runtime Intelligence grabs these parameters and sends them up to the repository (no programming for this). I can then correlate the ad activity within the context of sessions, feature usage, and runtime stack data that I am getting as a part of runtime intelligence.

For these metrics, I export my CSV data into a regular excel spreadsheet and then generate the pivot tables shown below.

App background
I always like to use data from true production apps rather than fabricate data sets; I am using two apps that I wrote and launched on the marketplace that are both ad driven, Yoga-pedia and A Free WPC Yogi – the former is a free version of a yoga app that (hopefully) helps to drive sales of a for an upgrade to A Pose for That. A Free WPC Yogi plays a similar role for The WPC Yogi, a tailored version of A Pose for That targeting WPC 2011 attendees.

The following post uses their Ad activity over the same one week period.

Impression counts
The following pie chart shows the “new ad” event count by application. As you can see, Yoga-pedia has roughly 4X the number of ad impressions and given the fact that these apps are very similar (but not identical) in their behavior, this also roughly correlates to the volume of usage as well.

Click-through rates
However, when I divide the total number of “ad engaged” events by the total number of “new ads,” I see that A Free WPC Yogi has a 28% higher click-through rate (1.78% versus 1.37). In point of fact the demographics of the app users are quite different (randomized consumers versus MSFT partners who are attending WPC 2011).

Advantage: This intelligence helps to segment users by differences in their behavior and to do a better job of targeting those differences across apps.

Impressions by country (or culture)
Runtime Intelligence can grab the IP address of the sending tower – this is not personally identifiable and cannot be used to locate an individual with any precision – but it is more than adequate to identify country, state, and city. In the following graph, I simply count new ad events by country and show the top 10 countries by impression volume.

Advantage: If your app has a cultural bias that would benefit from localization, understanding where your users are can help prioritize those localization efforts.

Click-through rates by country (or culture)
The following bar chart calculates the click-through rates for the top 10 countries listed above. What is interesting here is that there appears to be a significant difference in click through rates by country (culture).

Advantage: Understanding when/if users from specific cultures are significantly more likely to respond to (click on) ads can further help to prioritize localization or marketing investments.

Impressions by ISP provider (top 25)
To produce the next graph, I used an application to tell me who owned the IP addresses that my mobile clients are using (I used IP2Location – but there are many of them out there).

This is a nice way to see who my users favor in terms of their carrier. Here I only show the top 25.

Advantage: Understanding carrier popularity will help focus business development/marketing efforts and better manage potential risks associated with how your users may be negatively impacted by upgrade schedules (delays). Will your next app be dependent upon Mango?

Sessions per app page
In the raw CSV files that can be exported from the Runtime Intelligence portal, there is a column, ApplicationGroupId. The value in this column is unique for all signals (messages) that are sent from within a single app session. In other words, I can use this field to organize all user activity into the relative user sessions using this field. This is helpful for plotting specific user patterns.

The following graph simply counts the unique occurrences of ApplicationGroupId values by page name value (recall that this is the first parameter of the WhatPoseWhen method). This avoids counting multiple views of a single page within a single session and tells me how popular specific pages are across my user base. For this posting and for illustration, I’m only showing data for five specific pages.

FindAPoseDetail and BrowseSelectPose are central to the user experience (browsing for yoga poses and then drilling into a specific pose for detailed imagery and instruction). TellMeMore is the page where I describe what comes with the paid version of the app (nice to see that 10% of my users deliberately choose to investigate the upgrade possibility) and AppGuide and TopicList are essentially app documentation and I can see that these pages are not hit very often – and that’s not a bad thing – users should not need to use the documentation after their first use.

So – this graph is telling me that

a) My users are spending their time using the app rather than trying to use the app
b) I am at least getting my user’s attention regarding a possible upgrade – perhaps my content is not compelling enough if my conversion rate does not correlate.

Advantage: broad user proofing can be used to validate developer assumptions about user experience and effectiveness of pages for their specific purpose.
Ads shown per page compared to volume of times viewed
Next I calculate the average number of ads shows per page by dividing the total count of New Ad messages by page (this combines the two parameters, page name and the even New Ad) by the total count of the times the page is shown. TOTAL ADS SHOWN PER PAGE / TOTAL TIMES PAGE VISITED

I use the same ad duration interval across all of my pages – so this is actually another means of calculating how much time my users are spending on each page (this can be done with Runtime Intelligence alone, but in this case, I don’t have to do that).

The graph below shows the average number of ads shown per page and maps them to where they rank in terms of how often the page is visited.

Happily, the two core pages of my app also get the most ads (and are also where my users are stopping to spend time). I can also see that users spend more time on detailed pose descriptions than they do browsing – even though the browse more often than they drill down (which makes perfect sense).

Sadly, my upsell page is getting the least love – I definitely have to work on making this page more engaging.

Advantage: Ad frequency by page provides insight into where users spend their time. Calculating click-through rates by page identifies where users stop to look around and may be most open to suggestion.

Returning users and sessions per user
Another column in the CSV extract is the ANID – this is either the result of hashing the true ANID from a user’s phone (it is not the actual ANID value), or, if they opt-out of that, it will contain a GUID generated by our software and written to isolated storage. In either case, this value acts as a unique user identifier.

The ANID can be used to identify new and returning users. Dividing session count (ApplicationGroupId) by ANID gives the average number of sessions per user. The following bar chart takes the 10 ANIDs with the highest session counts and compares the resulting sessions per user value to the rest of the user base (whose count is roughly 500 other users).

What I see is that there is a core group of users that are heavily using my apps (YAY!). Now that I know who they are, I can zero in on their specific behaviors, how they relate to my ads, what features they use most heavily, etc.

Advantage: Segmenting users into new, returning, and power categories dramatically improves a developer’s ability to target, prioritize, and validate development, marketing, and support activities.
Conclusion

I hope to have shown how using Runtime Intelligence, developers can materially improve their ability to build more effective applications and refine their advertising strategy while coordinating that strategy with complimentary upsell strategies as well.

Advantage: Development!