Application Reliability in Windows Store Apps
The number one reason applications get bad reviews in the Windows Store is reliability: freezes and crashes. This is far more important to users than battery life, speed, or the amount of advertisements plastered on the screen. And since reviews are the only thing users see just prior to installation,
Harry Pierson argues that application reliability is not a binary state. Applications are not just reliable or not reliable, rather it is a continuous process. No matter how much you test there will still be bugs that won’t occur until except on the user’s machine. To this effect monitoring is crucial.
The first step is to review the application analytics and telemetry collected by Microsoft on all Windows Store applications. Analytics deal with application installations, de-installations, reviews, etc. Telemetry covers the behavior of the application at runtime such as the number of crashes and the number of times an application is terminated because it is unresponsive.
An application can be considered to be unresponsive if the UI thread is blocked for too long, but that isn’t the most common cause. In WinRT, applications in the background can be suspended to free up resources for other applications. The suspension process must take 5 seconds or less, otherwise the application will be flagged and terminated.
Microsoft admits the error reporting in the telemetry data isn’t very useful. A lot of the information is intentionally discarded by Microsoft to ensure that they don’t accidentally violate privacy laws by capturing sensitive data. The crash may also be far and away from the actual cause of the problem or worse, the application may silently lose data without crashing at all.
For these reasons developers are encouraged to augment the telemetry data with application specific logging. The most performant way to do this logging is Event Tracing for Windows (ETW) but that framework is difficult to use so a new logging API has been introduced in Windows 8.1.
This new API is based around the LoggingChannel class. This is a very simple API, it only supports a string message, an optional integer, and a level (e.g. verbose, information, warning,, error, critical). LoggingSession hosts the LogginChannel using a circular, memory-only buffer.
If a more permanent solution is needed the FileLoggingSession is used instead. As the log files are filled a LogFileGenerated event is fired. When this happens the event handler should asynchronously move the logs to a long-term storage location. Otherwise the logs will be silently deleted over time as the infrastructure cleans up the log folder.
For long running operations developers should use the LoggingActivity. This IDisposable class adds a log entry at the beginning and end of the operation.
In Windows 8 errors that cross over interopt boundaries were reduced to just an HRESULT. With WinRT 8.1 there will be a mechanism to get rich exception details.
Unhandled exceptions in non-XAML events such as network changed event will now be piped through the same unhandled exception handler that currently handled exceptions in XAML events (e.g. button clicked).
CoreApplication.UnhandledErrorDetected is recommended for C++ developers and logging frameworks. C++ developers can’t use the XAML error reporting framework because they don’t have a unified root class for their exceptions.
Microsoft recommends that unhandled exceptions be rethrown so that the application crashes after logging instead of running in an unstable state. The unhandled exception handler is used for observing problems, not hiding them.
Whenever possible, error handling blocks should do as little as possible. All of the setup for logging such as creating the folder where logs are stored should be handled during application startup so it’s ready.
To prevent the application from terminating before the log files are completely written, use Task.Wait on the call to save the log to disc. This is the one case where blocking the UI thread is acceptable.
WinRT supports something called a “maintenance task”. This type of background task does not require the user’s permission and can host a background uploader. The major limitation is that it only runs when the device is on A/C power which means the logs may not become available on the server for several days.
Uploads should be performed with the PUT verb, not POST. The difference is that PUT is considered idempotent and thus can be automatically retried. A POST request is assumed to not be safe for retry.
This information was originally presented in the Build 2013 session Making your Windows Store Apps More Reliable.
Todd Montgomery Dec 19, 2014