Designing and Developing Cross-Cutting Features
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by David Pallmann on May 11, 2009
In Part 1 of this series we introduced a design pattern for grid computing on Azure. In this article we'll implement the pattern by developing a grid application in C# and in Part 3 we'll run the application, first locally and then in the cloud. In order to do that, we'll need some help from a grid computing framework.
Unless you're prepared to write a great deal of infrastructure software, you'll want to use a framework for your grid application that does the heavy lifting for you and lets you focus on writing your application code. While Azure performs many of the services you would want in a grid computing infrastructure, it's still necessary to add some grid-specific functionality between Azure and your grid application. A good grid computing framework should do these things for you:
The diagram below shows how the framework brings the grid application and the Azure platform together. The application developer only has to write application-specific code to load input data, generate tasks, execute tasks, and save result data. The framework provides all of the necessary plumbing and tooling in a way that strongly leverages the Azure platform.

In this article we'll be using Azure Grid , the community edition of the Neudesic Grid Computing Framework. Azure Grid performs all of the functions listed above by providing 4 software components:
Azure Grid minimizes expense by only using cloud resources during the execution of your grid application. On-premise storage is where input data, results, and Azure Grid's tracking database reside. Cloud storage is used for communication with workers to pass parameters and gather results, and drains to empty as your grid application executes . If you also suspend your grid worker deployment when idle you won't be accruing ongoing charges for storage or compute time once your grid application completes.
The application we'll be coding is a fictional fraud check application that uses rules to compute a fraud likelihood score against applicant data. Each applicant record to be processed will become a grid task. The applicant records have this structure:

By applying business rules to an applicant record, the Fraud Check application computes a numeric fraud likelihood score between 0 and 1000, where zero is the worst possible score. An application will be rejected if it scores below 500.
When you design a grid application you need to determine the best way to divide up the work to be done into individual tasks that can be performed in parallel. You start by considering 2 key questions:
In the case of Fraud Check, it makes sense to create a separate task for each applicant record: the fraud scoring for each record is an atomic operation, and it doesn't matter what order the records are processed in as long as they all get processed.
Only one task type is needed for Fraud Check which we'll name "FraudScore". The FraudScore task simply renders a fraud score for an applicant record.
Tasks need to operate on input data and produce results data. The input data for FraudScore will be an applicant record and its results data will be a fraud score plus a text field explaining reasons for the score. FraudScore will expect parameters and return results with the names shown below.

In some Grid computing applications tasks might also need access to additional resources to do their work such as databases or web services. FraudScore does not have this requirement, but if it did some of the input parameters would supply necessary information such as web service addresses and database connection strings.
Now that our grid application's input parameters, tasks, and result fields are defined we can proceed to write the application. Azure Grid only asks us to write code for the Loader, the application's tasks, and the Aggregator.
The Loader code is responsible for reading in input data and generating tasks with parameters . Most of the time that will come from a database, but Fraud Check is written to read input data from a spreadsheet.
Azure Grid gives you the following starting point for your Loader in a class named AppLoader. The method GenerateTasks needs to be implemented to pull your input data and generate tasks with your task type names and your parameters. Your code will create Task objects and return them as an array. The base class, GridLoader, takes care of queuing your tasks into cloud storage where they can execute.

To implement the Loader for Fraud Check, we replace the sample task creation code with this code that reads input records from a spreadsheet CSV file and creates a task for each record.

The top row of the input spreadsheet should contain parameter names and subsequent rows should contain values, just as in shown earlier. Creating a task is simply a matter of instantiating a Task object and giving it the following information in the constructor:
Adding the Task to a List
The bookend to the Loader is the Aggregator, which processes task results and stores them locally.
Azure Grid gives you the following as a starting point for your aggregator in a class named AppAggregator. There are 3 methods to be implemented:
The base class, GridAggregator, takes care of processing results from cloud storage and calling your methods to store results.

In StoreResult, both the parameters and results for the current task are passed in as XML in this format:

To implement the aggregator for Fraud Check, we'll reverse what the Loader did and append each result to a spreadsheet CSV file.

With the loader and aggregator written, there's just one more piece to write: the application code itself. The AppWorker class contains the application task code. The current task is passed to a method named Execute is which examines the task code to determine which task code to execute.

For Fraud Check, the switch statement checks for the one task type in our application, FraudScore, and executes the code to compute a fraud likelihood score based on the applicant data in the input parameters.

The first order of business for the FraudScore code is to extract the input parameters, which are accessible through a dictionary of names and string values in the Task object.

Next, a series of business rules execute to compute the score. Here's an excerpt:

Lastly, FraudScore updates the task with results. This is simply a matter of setting names and string values in a dictionary.

The base GridWorker class and WorkerRole implementation take care of queuing the results to cloud storage where they will be retrieved by the Aggregator.
We've developed our grid application and are about ready to run it. Just a quick review of what we've just accomplished: using a framework, we implemented a loader, an aggregator, and task code. We only had to write code specific to our application.
All that remains is to run the application. With a grid application, you should always test carefully, initially by running locally with a small number of tasks. Once you're confident in your application design and code integrity, you can move on to large scale execution in the cloud. We'll be doing just that in the next article in this series, Part 3.
David Pallmann is a consulting director for Neudesic, a Microsoft Gold Partner and National Systems Integrator. Prior to joining Neudesic David worked on the WCF product team at Microsoft. He has published 3 technical books and maintains an active Azure blog. He is also a founding member of the Azure User Group.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
Alex Russell talks about the shortcomings of the web platform and how it is evolving in order to adress them. He also explains about how browsers are improving and shares his vision on things to come.
Jeff Lindsay discusses creating distributed and concurrent systems using ZeroMQ – a lightweight message queue-, and gevent – a coroutine-based networking library.
Brian Ketelsen introduces Skynet, a platform for polyglot, distributed and composable services that communicate with each other over RPC/JSON.
Carin Meier tells the story of Alice discovering Monads, meeting three types of monads – Identity, Maybe, State-, and learning how to implement them in Clojure.
The need for agile, queryable, reliable, scalable storage without the pain of SQL schema migration is real. This article uses MongoDB to introduce NoSQL concepts to Java, PHP, and Python developers.
Jérôme Giraud introduces Wink Toolkit, an open source mobile JavaScript framework for HTML5 web or hybrid apps, showing widgets and interactions.
Greg Wilson and Christophe Coenraets demo Adobe Edge, a motion and interaction tool, CSS Regions and Shaders, and PhoneGap.
No comments
Watch Thread Reply