WP7 Marketplace share (or how we became a victim of our own success)

September 22nd, 2011 by Sebastian Holst
This post tells the story of how good faith estimates of our WP7 marketplace penetration were under-reported by 500%. This is not a “gotcha blog” – there are only good actors with the best of intentions in this story; but that’s why I think it’s a story worth telling.

You see, we don’t just obfuscate – we hide the fact that your app is obfuscated. We don’t just offer application instrumentation and monitoring, we inject that logic to simplify and streamline packaging and improve performance. …and therein lies the rub.

Something was in the air

While at Build soaking up the heady atmosphere of Windows 8 and all that’s coming with it, a few MVP’s and also the Lowdermilk brothers took a moment to ask me if I’d read this awesome blog post where a developer had downloaded and analyzed all of the XAPs in WP7 marketplace. Apparently the market insight was killer and covered everything from the most popular libraries to snagging cloned apps. No more guessing – the facts were all laid out. And they each mentioned to me how surprised and even a little worried they were by how few apps were being obfuscated. I didn’t think too much of it at the time as I am among the first to point out that not every app needs to be obfuscated – but I did make a mental note to be sure to check out the blog.

When I got back home, I finally had a chance to track this “all seeing blog” down – it was Justin Angel’s blog and the post was Windows Phone 7 Marketplace Statistics. And it really is a fascinating post with both initiative and insight; and then I got to the obfuscation section.

According to Justin’s analysis, only 3% of the apps in the market were obfuscated! And as I scanned down, there were comments to the effect of “gee, since no one else is doing it, perhaps I shouldn’t bother either.”

Even more surprising to me was the fact that our analytics (Runtime Intelligence) was not even listed in a very long list of third party tools – when I knew for a fact that we have nearly a thousand apps sending data.

This can’t be right! (and I was)

Given the nearly 6,000 downloads of Runtime Intelligence and Dotfuscator for WP7 and the activity that I had been seeing over the past year, these numbers just didn’t seem right. I wrote to Justin who was quick to share his detection logic (in fact he posted the source on his blog) and just as quick to invite any comments or refinements that I might have to offer.

To put a fine point on this, Justin was in no way defensive and was as interested in getting to the right answer as I was.

Hung by our own petard


Without going into a lot of detail, Justin’s approach was to bust open the XAP and examine the various files and manifests to separate Silverlight from XNA and to identify the presence of third party tools. This approach proved to be effective because frameworks, tools and components leave behind files and other telltale fingerprints as a matter of course. There is one limitation though; this approach cannot detect when an application is modified or extended through IL manipulation or injection. And that’s exactly what we do.
From Build to Bill’ed (new numbers and how)

As I like to say, my ideas only have to be good, they don’t have to be original (trademark and patent laws not withstanding) and so I did what I often do when confronted with a conundrum – I asked Bill Leach, our CTO, for help. He quickly (dare I say magically?) authored our own “marketplace crawler” that populated our own XAP repository. Rather than look at XAP contents at a component level, he wrote some code that examined the binaries themselves.
The first pass looked for the custom attribute DotfuscatorAttribute inside the binaries. This is a good way (but not an absolute way) to determine if a file has been processed by Dotfuscator (for either obfuscation or injection of analytics). It’s not infallible because developers can remove that attribute if they chose (to further cloak the fact that they have used Dotfuscator). Here is what we found:
We downloaded 26,159 XAP files and 14.5% to have been processed by Dotfuscator.
This is basically 5X as many apps as Justin’s analysis had found (and that does not include the developers who configured Dotfuscator to remove the attribute we were searching for – so the number is certainly a bit higher).
In fact, we were surprised that Justin had found any at all – where did his 3% come from? Upon inspection, we think it’s an unexpected side-effect of how XAP’s are assembled – there are some instances where the configuration file of Dotfuscator gets pulled into the XAP – this is unnecessary and should never happen. We will document this behavior and make sure that users know how to prevent this from happening. In short, his 3% showed the prevalence of a bug – not the use of Dotfuscator.
To determine if an application was instrumented (rather than obfuscated), we applied some heuristics that are less obvious but can be shared if someone is interested (we looked for the existence of some high-level classes).

2.6% of marketplace apps are instrumented.
From my perspective, this is a low number – but to put it in perspective (or let’s be honest – I’m looking for the silver lining) we have a larger share than Google Analytics and Admob but a slightly lower number than Flurry.
Attack of the clones
Just one more point to be made in this post. If one were to consider each family of cloned apps as essentially a single, re-skinned app – these numbers have the potential to change materially. We may take a look at that, but I think we have already gotten most of what we can from the static analysis of the marketplace.

So – is that the whole story? (Of course not)

Don’t give me (just) static

As interesting as the static analysis of the WP7 marketplace is (and it is), static analysis only gives us a backwards facing snapshot of what’s already been deployed. We get no insight into:

  • Best practices that we would want to replicate (which are different that common practices),
  • Developer motivations behind their development choices
  • Future trends especially when driven by new technology and market opportunities
In the context of Dotfuscator and Runtime Intelligence, I would want to know
  • If the developers who built the 14.5% of “Dotfuscator processed” apps leaders or laggards?
  • Do they have special requirements that set them apart?
  • In short, do they have anything to teach the rest of us?
Want to know more about the 14.5% apps and what they have to teach us?

Coming soon, WP7 Developer Survey Results. We’ve been running surveys since the WP7 launch last year (you can checkout survey 1 and survey 2). As part of this ongoing effort, we have just closed out our third survey in the series and I will be posting results in the next few days – stay tuned!

SCHNEL! Or why patience is a virtue except when testing on Windows Phone

August 20th, 2011 by Sebastian Holst

Mystery solved! As I had promised in my last blog entry, I added exception reporting to my two apps, A Pose for That and Yoga-pedia to determine exactly what was going on with the exceptions that the Microsoft Marketplace was reporting but I had never seen. I needed to know:

  1. The cause of the exceptions (the stack traces were too cryptic for me to figure out)
  2. How to fix the problem(s)
  3. Solve the mystery as to why I have never seen a crash even though there is little doubt that they are indeed happening out there in the wild. If I can’t be confident that my testing is complete, I can never be confident that my app will behave when in matters most – in production.

Remember these three objectives because you will be tested later.

Now, I know that my entries are sometimes kind of long – so here are the conclusions…. And if you want to know how I back them up – then (hopefully) you will enjoy the rest of the post.

(PLUS, there’s a teaser at the very end).

Conclusions:

  1. Always account for “loss of context” in WP7 apps – probably the try-catch is the best approach but I will defer to “real developers” for the specific strategy. At least with Silverlight, impatient users can always force your app into an invalidOperationException.
  2. Culture matters in both user preferences and user expectations (and therefore user satisfaction). If at all possible, represent all relevant cultures in your test populations. How do you know what the relevant populations are? Analytics of course…
  3. Software quality, user experience and user profile are all intimately connected. Systems that only monitor user behavior (marketing) or only profile software stability (debugging) or only profile runtime configurations (marketplaces) are inherently weaker than an approach that accounts for the influence that each has on the others.
  4. Without Runtime Intelligence (or another comparable application analytics solution), no development team can be confident in either the quality of their app or their users’ experience.

And here’s the how and why I have come to these conclusions…


HOW – first I had to add my own exception reporting.

Exception Reporting: Adding exception reporting with Runtime Intelligence is very simple. All I had to do was add one exception reporting attribute as follows (from within Dotfuscator for Windows Phone)


Note that in the properties of this attribute I am asking that the method ExceptionExtendedData method be run. A Runtime Intelligence system probe attribute works fine during normal operations, but if I want custom data after an unhandled exception, this is a more reliable technique. Here is the method that I put in the App class:



As a side note, if I wanted to track thrown exceptions or handled, I could place the exception attribute down at the method level to get much more targeted data. Anyhow, after this simple step, I deployed the re-instrumented app to the marketplace and (sadly) watched the exceptions roll in…


Runtime Intelligence Exception reporting

Logging into my Runtime Intelligence portal account and selecting the date range I was interested in and then selecting “Exceptions”, presents me with the following:



I can see the total exceptions over time; the type of exceptions (I am only getting one – and that seems like it might be good news) and I have a list of all of the specific exceptions on the right. Clicking on any one of these shows me the detail as follows:

The graphic above shows screen captures from three different stack traces.

Good news item 1 is that (unlike the marketplace stack traces), I can see the diagnostic message. This may not mean much to the serious developers who enjoy offsets and cryptic traces – but I need these to go back to MSDN and other resources to see what is really going on and what I can do about them.

It turns out that there were seven different exceptions coming from my app – BUT ALL OF THEM HAD TO DO WITH TIMING – not some error in my general logic (in other words, I’m not dividing by zero or trying to display a non-existent image, etc.). For some reason my app is getting vertigo in my customers’ hands and losing track of what page was current resulting in any number of “InvalidOperationException.”

Good new item 2 is that there is a pretty standard way to manage this behavior; the try-catch statement. I’m in no position to explain how this works, but visit the link above for a great explanation.

So with basic Runtime Intelligence exception reporting I have addressed my first two requirements; to diagnose my app’s problem and identify a fix. BUT – I have not addressed the deeper and perhaps more troubling issue of why I have never seen this problem myself – what’s this all about? If I can’t improve my quality control, I can never feel comfortable that my app will perform in the wild as it does for me.

Good new item 3 is that I have Runtime Intelligence to give me EVEN MORE context on my app and my users. The fundamental flaw in almost every exception handling solution I have ever seen is that they (by necessity) can only look at the app when exceptions occur – they are too heavy-weight and/or too invasive to run all the time everywhere – no so with Runtime Intelligence.

If you ONLY have exception data, you are robbed of one of the most effective diagnostic heuristics available – the process of comparing populations in order to identify material differences between them and thus leading to a likely root cause. This is the fastest and cheapest way to figure out why I had never seen a crash.

What I did next was to compare the set of users who experienced exceptions with the general population of users and myself – was there something specific about their phones? Their software? Their behavior?

It turns out that the answers to these three questions are no, no and YES!

Process of elimination: First, I compared the system data of exception users and phone with the general population as defined in ExceptionExtendedData defined above… I won’t bore you with all of the metrics I was able to eliminate, but I will show one; manufacturer.



The two pie charts show the relative percentages of manufacturers in the general population of my users with the population that had exceptions – one can eyeball these and pretty quickly see that there is virtually no difference. The bar chart puts a fine point on this by showing the relative difference in share; Dell had only 1% of the total share and was not statistically significant – looking at the other three manufacturers, we can see that there is no more than a 20% variance between the two populations. This kind of range was consistent across all of the metrics I had been collecting except one.

Schnel!

In my last blog I had noted that there had appeared to be a disproportionate percentage of German speaking users in the exception population and it turns out that this was not a random blip – it showed up again in this latest exception data as follows:

The top bar chart shows the relative percentage of users by culture that experienced exceptions alongside the relative percentage of that culture in the general population. The second bar chart shows the relative difference in share by culture and it is truly surprising (at least to me).

Germans crashed my app 13X more often than norm, Austrians and the Dutch crashed the apps 4X what their relative share would suggest with the Malaysians right behind.

Given the relative distance between these populations and the different carriers and jurisdictions that these populations live under, it seems pretty clear that what these users have in common is their behavior. These users are simply more impatient than the rest of my users. They hit the “show pose” or “take me to the marketplace” or whatever more quickly and more often and so they are that much more likely to cause my app to lose its place.

Not only am I more patient (being an American and at one with the universe ;), but because I know my app and the areas where it may take a beat (or two) to respond – I naturally did not repeat my commands impatiently at those critical times – and therefore, I did not crash my app! Mysteries solved!

Conclusions: (AGAIN)

  1. Always account for “loss of context” in WP7 apps – probably the try-catch is the best approach but I will defer to “real developers” for the specific strategy. At least with Silverlight, impatient users can always force your app into an invalidOperationException.
  2. Culture matters in both user preferences and user expectations (and therefore user satisfaction). If at all possible, represent all relevant cultures in your test populations. How do you know what the relevant populations are? Analytics of course…
  3. Software quality, user experience and user profile are all intimately connected. Systems that only monitor user behavior (marketing) or only profile software stability (debugging) or only profile runtime configurations (marketplaces) are inherently weaker than an approach that accounts for the influence that each has on the others.
  4. Without Runtime Intelligence (or another comparable application analytics solution), no development team can be confident in either the quality of their app or their users’ experience.

TEASER – WOULDN’T BE AWESOME IF WE COULD DO ALL OF THIS PROFILING AND EXCEPTION ANALYSIS WITH HTML5/JAVASCRIPT TOO? STAY TUNED (IN)!

Runtime Intelligence Portal 2.0 now live for commercial users

July 22nd, 2011 by Brandon Siegel

The next version of PreEmptive’s Runtime Intelligence portal was rolled out for commercial Runtime Intelligence users Friday afternoon. The underlying infrastructure has been rebuilt from the ground up to provide better performance, reliability, and security. More details of what’s new in Runtime Intelligence Portal 2.0 are available in the original announcement blog entry.

As always, we deeply value your feedback. You can directly reach the Runtime Intelligence portal dev team by leaving a comment on this blog entry, posting a message on the Commercial RI Portal forum, or tweeting us @PreEmptive.

The new WP7 App Hub reporting is great – and it’s even better with analytics!

July 19th, 2011 by Sebastian Holst

Warning – this is a cliff hanger post. If you don’t like mysteries, come back in two weeks…

Like anyone else who has an app inside the WP7 App Marketplace, I noticed that the App Hub was down most of yesterday with the promise of a functional upgrade in the works – and today I was very pleasantly surprised to see the result; a streamlined experience with expanded capabilities.

One of the first things that caught my attention was the exception reporting by app and by date; very useful indeed. Of course, MSFT is quick to point out that (and I quote) “Crash count alone isn’t a direct measure of app quality. Popular apps may have higher crash counts due to higher usage.

Well that seems self-evident, but without usage metrics how can I evaluate the severity of my exception report counts? …. (and now, unless this is the first post of mine that you have ever read, you must know what’s coming).

To the cloud! (Sorry, I couldn’t resist). Using Runtime Intelligence for Windows Phone, I’m able to measure total sessions – by extracting these counts by day and mashing it up with exception counts from the marketplace – I can now supply the missing ingredient to make the exception count on the App Hub meaningful. (NOTE – I had to manually transcribe exception counts from the App Hub as there is no tabular option and the detailed download drops the daily count as it de-dupes the exceptions).

The App Hub is careful to point out that only apps running NODO (or Mango) can report exceptions, so I first had to remove the Runtime Intelligence session data coming from earlier versions of WP7 (an interesting statistic on its own).

Here is what I see… (and a warning here – the numbers aren’t pretty)

I took two apps of mine; Yoga-pedia and A Pose for That and looked at their respective usage on NODO+ phones via Runtime Intelligence and exception reports from the App Hub and then calculated the ratio of sessions to exceptions.

The time period I used for this test was the two weeks from June 12 to June 25. During that time, this is what I observed:

  • 66% of A Pose for That sessions were run on NODO.
  • 58% of Yoga-pedia sessions were run on NODO.

Here is the ratio of exceptions reported by MSFT and sessions from Runtime Intelligence… (click to enlarge)

Ratio of session counts and exception counts by day

Now there are three likely scenarios here.

  1. Over this two week period, both apps were crashing every 1 in 10 times they were run (HORRIBLE). I don’t think this is the case because I have run these apps myself on multiple phones hundreds of times and they have NEVER crashed.
  2. The App Hub is over-reporting exceptions (or somehow incorrectly associating exceptions with these apps). This is a beta feature on the App Hub – it’s certainly possible.
  3. Runtime Intelligence is way under-reporting the total number of sessions in a given day. Certainly possible, but given the unit testing I have done, I don’t see this as being a major contributing factor to these ratios – but certainly a possibility.


Now, I had already put a “feature tick” on the default unhandled exception handler to count how many times it was invoked during this same period. The counts I have are well below the App Hub numbers (which might suggest number 2 above is the culprit – BUT NOT SO FAST). It is more than likely that certain exceptions (perhaps a majority) would interrupt the normal feature tracking transmission mechanism so I would expect that count from Runtime Intelligence to be artificially LOW.

As is often the case when managing an application “in the wild”, an unanticipated question has arisen and I find that I don’t have enough data. That’s why its ALWAYS so important to

  • plan in advance what data is worth collecting to minimize the likelihood that you will end up in this situation and
  • be sure that your analytic solution supports rapid and easy iterations and refinements to compensate for when your planning falls short.


So how am I going to determine if

  1. my apps offer a LOUSY customer experience everywhere except for my personal phones or
  2. one or both exception reporting counts and session tracking counts are flawed?

Easy - I’m going to post an update of my apps to the marketplace this weekend with Runtime Intelligence Exception reporting turned on. What?

Runtime Intelligence for Windows Phone includes its own exception tracking capabilities – it does require that the developer activate it (that’s why I don’t have that data now), but it offers a lot more data and it can be invoked for unhandled, handled, and thrown exceptions. Further, it can be configured to collect additional information (custom for the app), AND it can be extended to offer the user a dialogue to provide additional feedback if they like.

I will post my results over the next few weeks – meanwhile, if anyone has any suggestions or ideas – please let me know… I honestly have no idea how this little mystery will play itself out.

Before I sign off – here is one more tantalizing clue (although it may also be a red herring). When I look at the limited unhandled exception data currently being returned by Runtime Intelligence (I can see tower location, device manufacturer, OS, etc.), I see that well over 50% of the phones that had an exception were localized to a language OTHER THAN en-US – and that is way out of proportion to the actual usage trends that I have been tracking (and posted in earlier entries). Further, the localizations that had the greatest “disproportionate” number of unhandled exceptions were de-DE and de-AT. Coincidence? Conspiracy? We don’t need to guess – we will soon have the facts!

PS here are two links that may be of interest:


Enjoy!

A Webinar on Monetizing Mobile Apps with Analytics

June 25th, 2011 by Sebastian Holst
For those who want a little more detail on the specific coding steps as well as an update on the latest application of analytics to mobile app development, we’ve scheduled a webinar. Here’s the info….. (the first time slot of 100 filled up in a few hours - so these are additional dates. We’ll keep scheduling these as long as interest is there) Cheers.

Title: Monetize Mobile Apps with Analytics
Registration: Thurs, Aug 4th at 11:30 EDT https://www3.gotomeeting.com/register/676857398
Registration: Wed, Aug 17th at 12:30 EDT https://www3.gotomeeting.com/register/433412582

Description: In this 60 minute webinar, we will take a live WP7 app and use real-world analytics to illustrate:

  1. The impact of try/buy scenarios on paid apps
  2. The relationship between free and paid versions of an app
  3. Strategies for ad-driven app design that consider page location, first time, occasional, and power user patterns, cultural trends, and other demographics including carrier and model profiling.

Preparation: NONE required. However, attendees are likely to get more from the presentation if they have already:
• Installed and are familiar with the Microsoft Windows Phone 7 development tools
• Installed and have some familiarity with PreEmptive Solutions Runtime Intelligence
• Installed and navigated around the free SKU of the sample application that will be referenced in the presentation. The free app is Yoga-pedia.