Questions for an Enterprise Architect
Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Steven Robbins on Jun 19, 2008
[M]emory is several orders of magnitude faster than disk for random access to data (even the highest-end disk storage subsystems struggle to reach 1,000 seeks/second). Second, with data-center networks getting faster, it’s not only cheaper to access memory than disk, it’s cheaper to access another computer’s memory through the network. As I write, Sun’s Infiniband product line includes a switch with 9 fully-interconnected non-blocking ports each running at 30Gbit/sec; yow! The Voltaire product pictured above has even more ports; the mind boggles. (If you want the absolute last word on this kind of ultra-high-performance networking, check out Andreas Bechtolsheim’s Stanford lecture.)Tim also pointed out the truth of the second part of Gray's statement: "For random access, disks are irritatingly slow; but if you pretend that a disk is a tape drive, it can soak up sequential data at an astounding rate; it’s a natural for logging and journaling a primarily-in-RAM application."
Memory is the new disk! With disk speeds growing very slowly and memory chip capacities growing exponentially, in-memory software architectures offer the prospect of orders-of-magnitude improvements in the performance of all kinds of data-intensive applications. Small (1U, 2U) rack-mounted servers with a terabyte or more or memory will be available soon, and will change how we think about the balance between memory and disk in server architectures. Disk will become the new tape, and will be used in the same way, as a sequential storage medium (streaming from disk is reasonably fast) rather than as a random-access medium (very slow). Tons of opportunities there to develop new products that can offer 10x-100x performance improvements over the existing ones.Dare Obsanjo pointed out how not paying attention to the mantra can have detrimental effects, a la Twitter's issues. Commenting on Twitter's content management-like implementation, Obsanjo said "The problem is that if you naively implement a design that simply reflects the problem statement then you will be in disk I/O hell. It won't matter if you are using Ruby on Rails, Cobol on Cogs, C++ or hand coded assembly, the read and write load will kill you." In other words, push the random-access operations into RAM and only use disk for sequential operations.
In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk (seeking is the process of moving the disk's head to a particular place on the disk to read or write data). So why is this interesting? Well, look at the trends in seek time and transfer rate. Seek time has grown at about 5% a year, whereas transfer rate at about 20%. Seek time is growing more slowly than transfer rate - so it pays to use a model that operates at the transfer rate. Which is what MapReduce does.While it remains to be seen if Solid State Drives (SSD) will change the seek/transfer ratios, many commenters to White's discussion thought that they may be a leveling factor in the RAM/hard drive debate.
provide object-based database capabilities in memory, and support core database functionality, such as advanced indexing and querying, transactional semantics and locking. IMDGs also abstract data topology from application code. With this approach, the database is not completely eliminated, but put it in the *right* place.The primary benefits of an IMDG over direct RDBMS interaction listed were:
NOSQL, The Web And The Enterprise
Getting Started with Stratos - an Open Source Cloud Platform
Wicked!
Time to use Prevayler :)
Prevayler is back (maybe it never left?)! It shall rule the world!
You can also use open source object database db4o (developer.db4o.com) configured as an in memory database. And you'd get all the benefits described in the article (in-local cache, no ORM, etc.)
While it remains to be seen if Solid State Drives (SSD) will change the seek/transfer ratios, many commenters to White's discussion thought that they may be a leveling factor in the RAM/hard drive debate.
While RAM is faster than a hard drive, it's not the performance that makes the difference. The hard drive concept is "slow" because it's a shared storage model, and the RAM is "fast" because there's some of it co-located with every CPU. If the hard drives were local then the scalability would be roughly identical, and the scalability is orders of magnitude more important than the raw single-threaded latency in a large system.
Peace,
Cameron Purdy
Oracle Coherence: Data Grid for Java, .NET and C++
Does that mean that all we need to do is replace the current disks with RAM technology to gain speed? The title of the article leads people to think along those lines.
IMO It's not just the speed of memory compared to disks that makes a difference. It's not even the extra benefit of the collocation of CPU and memory. What's really a important is the fact that disk is a sequential storage medium that was designed primarily to store a stream of bytes, not tables of data.
See my recent post on that matter for more details.
Nati S.
GigaSpaces
This is the approach that we've been taking with Web caching for HTTP-based services; serving a response out of memory is infinitely faster than getting it off a disk or from the origin server, and cache peering allows you to reach across the network and get it from a peer.
The cyclic COSS filesystem in Squid is a good choice when you *must* go to disk.
For a multihreaded, indexed, clustered and simple in-memory Java collection persistence system. ;-)
www.space4j.org/
Very good article.
However, you say in the advantages of IMDG over RDBMS that:
"Data can be accessed by reference"
but I've never worked on a project where this is possible. In n-tier applications, (e.g. Java server based http) you always have to look up objects by some kind of ID because of the request/response mechanism.
Recently I've been working with Flex and Java using BlazeDS in which case the object I'm manipulating in the client is serialised over the wire (Java to ActionScript to Java). Thus the object that gets passed to my invoked methods does not have the same reference and I have to do a lookup by ID anyway. (Some Adobe fan might point out that LiveCycle DataServices can actually handle this, but what is it doing under the covers? I don't know for sure, but I imagine it's passing IDs around)
In both cases this has to happen regardless of whether the storage is an OODBMS, RDBMS, some kind of fancy caching or just stored in collections.
So I guess my question actually is, can you give an example of a scenario in which data being accessed by reference is an advantage?
Cheers,
Chris
Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?
Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.
Chris Richardson shows how he ported a relational database to three NoSQL data stores: Redis, Cassandra and MongoDB.
Jean Tabaka challenges the audience to reflect on what Agile practices they are employing, how they are using them, ending with the questions “Why have their organization chosen to go Agile?
Andreas talks about the benefits of the Open Web and how it compares to proprietary stacks. He also talks about various projects that push the envelope like Boot to Gecko, Broadway and pdf.js.
Ron Bodkin discusses early adoption of Hadoop, NoSQL and describes MapReduce and related libraries and Frameworks. Other topics include Hive, Pig, multi tenancy, and security in a big data environment
Stephen Bohlen explains how Spring helps with interoperability between Java and .NET, demoing it with the help of a sample application.
Guilherme Silveira mentions some of the turning points in project development that may affect the quality of the code offering advice on avoiding writing crappy code.
9 comments
Watch Thread Reply