Agile in Practice: What Is Actually Going On Out There?
Scott Ambler talks about actual data resulting from surveys made during 2006-2008, showing how Agile is perceived and implemented within organizations.
Tracking change and innovation in the enterprise software development community
Posted by Steven Robbins on Jun 19, 2008 04:31 AM
Jim Gray, a man who has contributed greatly to technology over the past 40 years, is credited with saying that memory is the new disk and disk is the new tape. With the proliferation of "real-time" web applications and systems that require massive scalability, how are hardware and software relating to this meme?[M]emory is several orders of magnitude faster than disk for random access to data (even the highest-end disk storage subsystems struggle to reach 1,000 seeks/second). Second, with data-center networks getting faster, it’s not only cheaper to access memory than disk, it’s cheaper to access another computer’s memory through the network. As I write, Sun’s Infiniband product line includes a switch with 9 fully-interconnected non-blocking ports each running at 30Gbit/sec; yow! The Voltaire product pictured above has even more ports; the mind boggles. (If you want the absolute last word on this kind of ultra-high-performance networking, check out Andreas Bechtolsheim’s Stanford lecture.)Tim also pointed out the truth of the second part of Gray's statement: "For random access, disks are irritatingly slow; but if you pretend that a disk is a tape drive, it can soak up sequential data at an astounding rate; it’s a natural for logging and journaling a primarily-in-RAM application."
Memory is the new disk! With disk speeds growing very slowly and memory chip capacities growing exponentially, in-memory software architectures offer the prospect of orders-of-magnitude improvements in the performance of all kinds of data-intensive applications. Small (1U, 2U) rack-mounted servers with a terabyte or more or memory will be available soon, and will change how we think about the balance between memory and disk in server architectures. Disk will become the new tape, and will be used in the same way, as a sequential storage medium (streaming from disk is reasonably fast) rather than as a random-access medium (very slow). Tons of opportunities there to develop new products that can offer 10x-100x performance improvements over the existing ones.Dare Obsanjo pointed out how not paying attention to the mantra can have detrimental effects, a la Twitter's issues. Commenting on Twitter's content management-like implementation, Obsanjo said "The problem is that if you naively implement a design that simply reflects the problem statement then you will be in disk I/O hell. It won't matter if you are using Ruby on Rails, Cobol on Cogs, C++ or hand coded assembly, the read and write load will kill you." In other words, push the random-access operations into RAM and only use disk for sequential operations.
In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk (seeking is the process of moving the disk's head to a particular place on the disk to read or write data). So why is this interesting? Well, look at the trends in seek time and transfer rate. Seek time has grown at about 5% a year, whereas transfer rate at about 20%. Seek time is growing more slowly than transfer rate - so it pays to use a model that operates at the transfer rate. Which is what MapReduce does.While it remains to be seen if Solid State Drives (SSD) will change the seek/transfer ratios, many commenters to White's discussion thought that they may be a leveling factor in the RAM/hard drive debate.
provide object-based database capabilities in memory, and support core database functionality, such as advanced indexing and querying, transactional semantics and locking. IMDGs also abstract data topology from application code. With this approach, the database is not completely eliminated, but put it in the *right* place.The primary benefits of an IMDG over direct RDBMS interaction listed were:
Hibernate without Database Bottlenecks
Scale Your Application without Punishing Your Database
Guide to Calculating ROI with Terracotta Open Source JVM Clustering
Why Should I Care About Terracotta?
Terracotta 2.6 - Download now for scalability without tradeoffs
Wicked!
Time to use Prevayler :)
Prevayler is back (maybe it never left?)! It shall rule the world!
You can also use open source object database db4o (developer.db4o.com) configured as an in memory database. And you'd get all the benefits described in the article (in-local cache, no ORM, etc.)
While it remains to be seen if Solid State Drives (SSD) will change the seek/transfer ratios, many commenters to White's discussion thought that they may be a leveling factor in the RAM/hard drive debate.
While RAM is faster than a hard drive, it's not the performance that makes the difference. The hard drive concept is "slow" because it's a shared storage model, and the RAM is "fast" because there's some of it co-located with every CPU. If the hard drives were local then the scalability would be roughly identical, and the scalability is orders of magnitude more important than the raw single-threaded latency in a large system.
Peace,
Cameron Purdy
Oracle Coherence: Data Grid for Java, .NET and C++
Does that mean that all we need to do is replace the current disks with RAM technology to gain speed? The title of the article leads people to think along those lines.
IMO It's not just the speed of memory compared to disks that makes a difference. It's not even the extra benefit of the collocation of CPU and memory. What's really a important is the fact that disk is a sequential storage medium that was designed primarily to store a stream of bytes, not tables of data.
See my recent post on that matter for more details.
Nati S.
GigaSpaces
This is the approach that we've been taking with Web caching for HTTP-based services; serving a response out of memory is infinitely faster than getting it off a disk or from the origin server, and cache peering allows you to reach across the network and get it from a peer. The cyclic COSS filesystem in Squid is a good choice when you *must* go to disk.
Scott Ambler talks about actual data resulting from surveys made during 2006-2008, showing how Agile is perceived and implemented within organizations.
From QCon 2008, Daniel Moth presents on using Visual Studio 2008 and .NET 3.5 to create compelling rich Windows applications.
Joshua Kerievsky, founder of Industrial Logic, talks about Industrial Extreme Programming which extends XP by including practices dealing with management, customers and developers.
Amazon Web Services (AWS) Evangelist Jeff Barr discusses SimpleDB, S3, EC2, SQS, cloud computing, how different Amazon services interact, origins of AWS, AWS globalization and the March AWS outage.
Cloud services have helped bring virtualization to the forefront. Its full power however, also includes other benefits such as high availability, disaster recovery, and rapid provisioning.
John Lam talks about his path to dynamic languages, some of the problems of making IronRuby run fast, and how the DLR helps with implementing languages.
VMware Infrastructure 3: Advanced Technical Design Guide and Advanced Operations Guide provides a wealth of practical insights into setting up virtualization in todays corporate environments.
Can a system that is so large it cannot be comprehended be "designed" in a conventional sense? The foundations of computing are about to change. In this talk, Richard P. Gabriel explores why and how.
7 comments
Reply