InfoQ

News

Java In-Memory Persistence with Space4J

Posted by Dionysios G. Synodinos on Sep 20, 2008 05:30 PM

Community
Java
Topics
Data Access ,
Performance & Scalability

Space4J is a simple database system that will let you work with Java Collections in memory. Since memory is several orders of magnitude faster than disk for random access to data, Space4J provides better scalability for "real-time" web applications and systems that require performance.

With Space4J Instead of having to perform a SQL SELECT to fetch a User from a database table, the developer accesses the users map (java.util.Map) and calls users.get(id). Since all data is kept in memory inside the JVM, there is no need for an extra database application, a socket connection, a JDBC driver, a SQL statement or any kind of ORM tool. Data is just there, inside objects, inside Java maps. For operations that modify data a Command object is created and then serialized and saved to disk in a log file. At restart the past commands are read from the log files and re-applied, generating the exact same data set you had for example before a system crash.

In order to prevent log files from becoming huge from time to time the application can take a snapshot of all data to disk. Space4J keeps all data inside the Space object. When taking a snapshot, the whole Space object is serialized and saved to disk. Therefore at restart only the commands since the last snapshot need to be re-applied, not all of them.  The size of the snapshot will depend on the application. Also the system has to enter read-only mode when saving the snapshot to disk, unless a Space4J cluster is used. An example of such a deployment would be a web application in load balance, where every web server would have its own Space4J node from a cluster

Space4J comes with a complete indexing framework that supports four different types of indexes to search data in a variety of ways. Also it can be used alongside a regular database  for offline work, data  warehousing, reports, etc.

Space4J uses the Java 1.6 concurrent data structures for concurrent read/write access to data so writers only block writers, readers don't block or get blocked by anything. This means that modifications are done one at a time while read-access operations are done concurrently without any ConcurrentModification exceptions!

 You can download the latest release (0.9.1) or browse through the source repository.

For a more information regarding the emerging paradigm of shifting data access from disk to memory for performance and other data access issues, you can read “RAM is the new disk...” by Steven Robbins, here at InfoQ.

10 comments

Watch Thread Reply

Prevayler by Peter Monks Posted Sep 20, 2008 9:39 PM
Re: Prevayler by Ladislav Thon Posted Sep 21, 2008 3:19 AM
Broken? by Martin Probst Posted Sep 21, 2008 5:32 AM
Re: Broken? by Sergio Oliveira Posted Sep 23, 2008 4:53 PM
My 2 cents by Thai Dang Vu Posted Sep 22, 2008 8:45 AM
Caching by Gleb Liferenko Posted Sep 22, 2008 10:44 AM
GC by Dax Abraham Posted Sep 23, 2008 8:02 PM
Re: GC by Sergio Oliveira Posted Sep 27, 2008 8:23 AM
How is this different? by Sarath Chandra P Posted Oct 6, 2008 9:27 AM
Re: How is this different? by Sarath Chandra P Posted Oct 6, 2008 9:39 PM
  1. Back to top

    Prevayler

    Sep 20, 2008 9:39 PM by Peter Monks

    How does Space4J compare to Prevayler?

  2. Back to top

    Re: Prevayler

    Sep 21, 2008 3:19 AM by Ladislav Thon

    I wanted to ask very same question, but it is apparently answered on the product site:

    How Space4J compares to Prevalyer? Space4J and Prevayler are two free Java implementations of the same concept. Prevayler has the merit of being the first implementation created from scratch by Klaus Wuestefeld. Klaus also has the merit of pushing the idea of a prevalent system that is both possible and desirable in many cases. Although Space4J and Prevalyer are centered in the same idea, they have totally different APIs and implementations. It is pretty much like Struts and JSF for web frameworks.

  3. Back to top

    Broken?

    Sep 21, 2008 5:32 AM by Martin Probst

    I wonder if this is really worth the hassle. You will get into terrible problems once your data exceeds some GB, concurrency is difficult to get right for access over several collections (no transactions), and there is no rollback or similar. I think it's questionable if for such a limited approach if it's not easier and more straight forward to write your own simple write-ahead logging persistence, in particular if you have high performance requirements. And in other cases, just use a database.

  4. Back to top

    My 2 cents

    Sep 22, 2008 8:45 AM by Thai Dang Vu

    If you don't use SQL statements then it'll be difficult to switch to real database when we need more space than the memory can provide.

  5. Back to top

    Caching

    Sep 22, 2008 10:44 AM by Gleb Liferenko

    Such a technology should be used in combination with, not instead of a relational database, similar to how [[cached]] is used. Also, it would solve your transaction problems.

  6. Back to top

    Re: Broken?

    Sep 23, 2008 4:53 PM by Sergio Oliveira

    Martin said: concurrency is difficult to get right for access over several collections (no transactions)

    This difficult task was done for me by Doug Lee in the java.util.concurrent package. Like it is said in the article: "Space4J uses the Java 1.6 concurrent data structures for concurrent read/write access to data so writers only block writers, readers don't block or get blocked by anything. This means that modifications are done one at a time while read-access operations are done concurrently without any ConcurrentModification exceptions!"

    Martin said: there is no rollback or similar

    This is a drawback I agree. But commands should be atomic by nature and you can always check for errors before modifying. That's possible because every command is executed in isolation from the other ones, in other words, writes are serialized and never executed concurrently.

    Gleb said: Such a technology should be used in combination with, not instead of a relational database, similar to how [[cached]] is used. Also, it would solve your transaction problems. Yes, you got the correct idea!

  7. Back to top

    GC

    Sep 23, 2008 8:02 PM by Dax Abraham

    What about garbage collection when the size of the object in memory becomes really huge ...

  8. Back to top

    Re: GC

    Sep 27, 2008 8:23 AM by Sergio Oliveira

    You must be talking about the Space object that holds all other collections. This guy is never GC. There is no need to. You should think as Space4J as a relational database, but: Database = Space object Table = Java collection Index = Java Map Selects = direct collection access like map.get(id) Inserts/Updates/Deleted = Space4J command

  9. Back to top

    How is this different?

    Oct 6, 2008 9:27 AM by Sarath Chandra P

    From Object Caching? From memcached?

  10. Back to top

    Re: How is this different?

    Oct 6, 2008 9:39 PM by Sarath Chandra P

    oops.. I lost context! ignore my earlier question

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.