New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Charles Humble on Feb 26, 2008
Based on Cambridge University's Xen Virtualization, Amazon's Elastic Computer Cloud ("EC2"), is a computing service which allows users to create, launch and terminate Linux based server instances on demand. Each virtual machine instance is a virtual private server assigned an IP address via DHCP on start-up. Virtual Machine images, which Amazon calls Amazon Machine Images (AMIs), can be archived and transported much like VMware's virtual appliances, so a developer can set-up an initial instance of the required software and rapidly deploy it to a number of virtual servers.
A previous InfoQ article looked at the appeal of the service for development teams, such as Oracle's Coherance team, who need large amounts of computing power for short periods of time. The flexibility of the service also makes it attractive to web based start-ups: You have no up-front costs since you don't need to buy expensive hardware, the running costs are generally quite low, and you can install any software you want on your Linux instances. The service can also be readily adapted to changing traffic patterns by starting and stopping additional instances as required. Finally the service is backed by a well known name in Amazon who have a track record of delivering a highly scalable, robust web infrastructure. That said, the lack of any SLA (Service Level Agreement) constitutes a significant barrier to adoption, with some businesses reluctant to entrust data or critical services to it.
There are also practical problems to overcome. For example the DHCP nature of the virtual servers means that the IP address changes each time the server is started. A consequence of this is that, following an outage, a web site would need to update its DNS entries - a process which can take up to ninety-six hours to complete. To work around this Amazon recommends using a dynamic DNS solution such as DynDNS, and in a recent blog article Codesta's Oliver Chan provides details on how to set up DynDNS for EC2.
The same blog provides some other useful hints for developers considering the EC2 service:
- "Before spending too much time configuring and customizing an AMI, find one that suits your needs from the start so you won't have to redo any work later on down the road. Check out the list of public AMIs in Amazon’s resource center for something that is more suitable for your needs"
- "When packaging up your own image using the ‘ec2-bundle-vol’ command, make sure you specify a clean folder using the '–d' flag otherwise bundling the same image twice will result in an error due to the conflicting sets of temporary files."
- "When working with your image, note that the main drive/partition (where the system files are) has a very limited capacity (10 GB in our case). So when dealing with large files/directories use ‘/mnt’ as it has over 100 GB."
- "If a machine is terminated, all your data will be lost except for what was backed up from the last time you ran an 'ec2-bundle-vol'"
As EC2 continues to gain momentum open source toools and libraries are emerging to make the life of developers using the service even easier. One such project, building on Chris Richardson's EC2Deploy, is Cloud Tools which comprises:
Cloud Tools is still very much under development but it provides a means for developers to get up and running on EC2 in a matter of minutes.
Getting Started with Stratos - an Open Source Cloud Platform
Improve Java Garbage Collection, Runtime Execution, and JVM visibility with Zing
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Do I remember correctly that the EC2 service is not useful if you need a persistent database? When your virtual server goes down, the database is gone?
Yes. There are a couple of common workarounds that people seem to be using. One is to use the Amazon Simple DB service to act as the DB. The other is to use Linux Volume manager snapshots to back-up the DB to Amazon S3.
Ah, I guess my question was answered in point 4 above, thanks.
Cloud Tools look really cool; will investigate.
I use EC2 quite a bit on my site (sendalong.com). I do use DynDNS, but to even better handle the issue of servers going up and down quickly, I run one webapp on a "normal" host. This webapp keeps a registry of all EC2 instances that are available and routes users as needed - this way, even if DNS is not updated, it can fall back to routing using a very ugly and long EC2 url.
Also, I've found a general good practice if possible (like in the set up above) is to use unique cnames like www1.sendalong.com, www2.sendalong.com, etc for EC2 instances. This means that if www1 goes down you can immediately spawn an instance and map it to www3, or www4, etc. - and since that cname hasn't been used before, the DNS update will happen very quickly (if you used www1 again with a new IP, the DNS update would take a while to propogate).
Jon Chase
www.sendalong.com - Send large files to anyone
They actually have another service called Amazon SimpleDB which acts as a database via a web service. Not exactly what some might want but they will sell it to you! Amazon has a really good developer forums section for all its products which is a great resource if you want to develop using their apis
Thanks for the heads-up on this! I have an application right now that will very likely require the variable scaling that EC2 can provide, and it looks like it will do it at a fraction of the cost of other more traditional solutions.
Dave Rooney
Mayford Technologies
Lack of persistent disk out of the box and IP addresses that are not preserved between reboots are indeed the 2 primary issues that make hosting public web sites on Amazon EC2 "tricky" (even though not impossible).
Instead of addressing the problem head on, have you considered multisourcing your infrastructure? Let Amazon EC2 host your highly scalable computationally heavy logic, but host database layer and possibly thin static IP layer somewhere else. I described our multisourcing technique called VcubeV in my article in Feb issue of Linux Journal - www.linuxjournal.com/article/9915 (available to non-subscribers in March 08), additional info can be found at elasticserver.blogspot.com/2008/01/introducing-...
With database writes way less frequent than reads and availability of excellent caching (memcached, for example), this is quite doable.
- Dmitriy
www.cohesiveft.com/elastic/
Take a look at the offering from Elastra (www.elastra.com)
They allow standard RDBMS (mysql, etc) to be deployed on EC2 instances and uses SS3 to store the data (not a backup solution, but in real time)
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
8 comments
Watch Thread Reply