Jesper Boeg on Priming Kanban
In this interview, Jesper Boeg, author of the new InfoQ book – Priming Kanban, discusses the keys to using Kanban effectively, and how to get started if you are currently using other approaches.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by David Pallmann on May 25, 2009

Let's get our set up out of the way so we can start running.
Azure Grid provides you with a base solution template to which you add application-specific code. Last time, we added the code for the Fraud Check grid application but didn't cover the solution structure itself. Let's do so now.
Our grid computing solution has both cloud-side software and enterprise-side software. The Azure Grid framework holds all of this in a single solution, shown below. So what's where in this solution?

Running the cloud-side and on-premise side parts of the solution is simple:
For testing everything locally, you may find it useful to set your startup projects to both AzureGrid and Grid Manager so that a single F5 launches both sides of the application.
Azure Grid tracks job runs, tasks, parameters, and results in a local SQL Server or SQL Server Express database (2005 or 2008). The project download on CodePlex includes the SQL script for creating this database.
Both the cloud-side and on-premise parts of the solution have a small number of configuration settings to be attended to, the most important of which relate to cloud storage.
Cloud-side Grid Worker settings are specified in the AzureGrid project's ServiceConfiguration.cscfg file as shown below.

Configuration settings for the on-premise Grid Manager are specified in the Grid Manager project's App.Config file. Most of the settings have the same names and meaning as just described for the cloud- side configuration file and need to specify the same values. There's also a connection string setting for the local SQL Server database used for grid application tracking.

Since grid applications can be tremendous in scale, you certainly want to test them carefully. In an Azure-based grid computing scenario, I recommend the following sequence of testing for brand new grid applications:

1. Developer Test.
Test the grid application on the local developer machine with a small number of tasks. Here your focus is verifying that the application does what it should and the right data is moving between the grid application and on-premise storage.
The Grid Manager of course executes on the local machine which is always the case.
2. QA Test.
Test the application with a larger number of tasks using multiple worker machines, still local. The goal is to verify that what worked in a single-machine environment also works in a multiple machine environment using cloud storage.
Note that while this is still primarily a local test, we're now using Azure cloud storage. This is necessary as we wouldn't be able to coordinate work across multiple machines otherwise.
3. Staging. Deploy the grid application to Azure hosting in the Staging environment and run the application with a small number of tasks. In this phase you are verifying that what worked locally also works with cloud-hosting of the application.
4. Production. Now you're ready to promote the application to the Azure hosting Production environment and run the application with a full workload.
You can use Azure Manager to monitor the grid application is running as it should.
Now we're all set to try a local test on our developer machine. As per our test plan, we'll initially test with a small workload on the local developer machine.
The input data is the CSV worksheet shown below which was created in Excel and saved as a CSV file. The Fraud Check application expects this input file to reside in the account where GridManager.exe launches from. We'll use 25 records in this test.

Ensure your database is set up and that your cloud-side and enterprise-side configuration settings reflect local developer storage.
Build the application and launch both the AzureGrid project and the GridManager project.
The AzureGrid project won't display anything visible, but you can check it is operating via the Developer Fabric UI if you wish.
The GridManager application is a WPF-based console that looks like this after launch:

To launch a job run of your application, select Job > New Job from the menu. In the dialog that appears, give your job a name and description.

Click OK and you will see the job has been created but is not yet running. The job panel is red to indicate the job has not been started.

Now we're ready to kick off the grid application. Click the Start button, and shortly you'll see the job panel turn yellow and a grid of tasks will be displayed.

The display will update every 10 seconds. Soon you'll see some tasks are red, some are yellow, and some are green. Tasks are red while pending, yellow while being executed, and green when completed.

When all tasks are completed, the job panel will turn green.

With the grid application completed, we can examine the results. Fraud Check was written to pull its input parameters from an input spreadsheet and write its results to an output spreadsheet. Opening the newly created output spreadsheet, we can see the grid application has created a score, explanatory text, and an approval decision for each of the input records.

In the test run we just performed, you saw Grid View, where you can get an overall sense of how the grid application is executing. Each box represents a grid task and shows its task Id. Below you can see how Grid View looks with a larger number of tasks.

In addition to the grid view, you can also look at tasks and results. The Grid View, Task View, and Results View buttons take you to different views of the grid application.
During or after a job run, you can view tasks by clicking the Task View button. The Task View shows a row for each task displaying its Task Id, Task Type, Status, and Worker machine name. When you're running locally you'll always see your own local machine name listed, but when running in the cloud you'll see the names of the specific machine in the cloud executing each task.

Result View is similar to Task View--one row per task--but shows you the input parameters and results for each task, in XML form. You may want to expand the window to see the information fully.
We've seen the grid application work on a local machine, now let's run it in the cloud. That requires us to change our configuration settings to use a cloud storage project rather than local developer storage; and also to deploy the AzureGrid project to a cloud-hosted application project.
The steps to deploy to the cloud are the same as for any hosted Azure application:
1. Right-click the AzureGrid project and select Publish.
2. On the Azure portal, go to your hosted Azure project and click the Deploy button under Staging to upload your application package and configuration files.
3. Click Run to run your instances.

Next, prepare your input. In our local test run the input to the application was a spreadsheet containing 25 rows of applicant information to be processed. This time we'll use a larger workload of 1,000 rows.
We only need to launch the Grid Manager application locally this time since the cloud-side is already running on the Azure platform.
Again we kick off a job by selecting Job, New Job from the Azure Manager menu.

As before, we Click OK to complete the dialog and then click Start to begin the job. The loader generates 1,000 tasks which you can watch execute in Grid View.

Switching to Task View, you can see the machine names of the cloud workers that are executing each task.

Even before the grid application completes you can start examining results through Results View (shown earlier).

With the job complete, cloud storage is already empty. We also suspend the deployment running in the cloud since there is no work for it to avoid accruing additional compute charges.
Once the application completes, the expected output (an output CSV file in the case of Fraud Check) is present and has the expected number of rows and results for each.

We can see 1,000 rows were processed, but the records aren't in the same order. That's normal: we can't control the order of task execution nor are we concerned with it as each task is atomic in nature. That's one reason you'll typically copy some of the input parameters to your output, to allow correlation of the results.
It's difficult to assess the performance of grid computing applications on Azure today because we are in a preview period where the number of instances a developer can run in the cloud is limited to 2 per project.
There is one way to run more than 2 instances of your grid computing application today, and that's to host your grid workers in more than one place. You can use multiple hosted accounts, perhaps teaming up with another Azure account holder. You can also run some workers locally. This works because the queues in cloud storage are the communication mechanism between grid workers and the enterprise loader/aggregator. As long as you use a common storage project, you can diversify where your grid workers reside.
We can also expect some things that are generally true of grid computing applications to be true on Azure: for example, compute-intensive applications are likely to bring the greatest return on a grid computing approach.
As we move to an expected Azure release by year's end, we should be able to collect a lot of meaningful data about grid computing performance and how to tune it.
A
In this interview, Jesper Boeg, author of the new InfoQ book – Priming Kanban, discusses the keys to using Kanban effectively, and how to get started if you are currently using other approaches.
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
1 comment
Watch Thread Reply