BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News InfoQ Dev Summit Munich: In-Memory Java Database EclipseStore Delivers Faster Data Processing

InfoQ Dev Summit Munich: In-Memory Java Database EclipseStore Delivers Faster Data Processing

At the InfoQ Dev Summit Munich, Markus Kett presented a database alternative for Java: the in-memory EclipseStore promises faster data processing with lower cloud costs and fewer CO2 emissions. It stores Java object graphs as binary files in local or cloud file systems and uses Java Streams for queries. But applications need to manage concurrent writes and use the commercial MicroStream Cluster for shared storage between multiple JVM instances.

The object graphs of Java applications have an impedance mismatch with databases: relational databases store data in rows and columns, whereas NoSQL databases store key-value pairs, JSON documents, and other formats. Applications can do the necessary conversions themselves or rely on Object-Relational Mapping (ORM) frameworks, like Hibernate or Spring Data, to do so. Either way, memory and CPU usage increase, as does latency. Local caches or remote caching layers, such as Redis, reduce latency at the cost of higher complexity and more cost.

EclipseStore is an Eclipse open-source project. Like other in-memory databases, it can store the data in RAM only or persist it to the local file system. But in cloud containers, an application loses local files, at least with new versions of the container image. That's why file services like Amazon S3, Azure Blob Storage, or Google Cloud Firestore are a better fit for EclipseStore in the cloud.

EclipseStore loads the IDs of all stored objects into memory at startup. This increases the application's memory needs linearly with the number of stored objects. By default, the full objects are lazily loaded on demand, but objects can be eagerly loaded during initialization.

EclipseStore does not use a query language like SQL. Instead, it uses the Java Stream API:

List<Article> unavailableArticles =
  shop.getArticles().stream()
    .filter(article -> article.available() == false)
    .collect(groupingBy(article -> article.vendor()));

EclipseStore queries against objects in memory take microseconds to complete, compared to dozens or hundreds of milliseconds for queries against traditional databases. However, queries against objects in storage are much slower in EclipseStore: It must potentially load millions of objects from the cloud file storage and evaluate them inside the application. A traditional database server, on the other hand, runs queries locally and only sends the results to the Java application over the network.

With EclipseStore, loading and storing individual objects is very fast, as there is only a single layer with straightforward conversion to the proprietary EclipseStore data format. With traditional databases, the overhead is much bigger: The popular Spring Data JPA ORM has two more layers underneath, Hibernate and the database driver, and all three layers process and convert data.

EclipseStore stores objects in a proprietary binary format. Saving object changes adds a new binary file and is a blocking, transaction-safe operation. Every object is only stored once. In addition to cloud file systems, EclipseStore can also use databases for storage. EclipseStore does not manage concurrent writes, so applications must prevent threads from overwriting each other.

A garbage collector deletes old object versions and corrupt files. A web application displays the data of an EclipseStore instance, which also has REST and GraphQL APIs. Popular tools for querying databases do not yet support EclipseStore.

MicroStream is the company behind EclipseStore. They started development in 2013 and open-sourced it in 2021. It became an Eclipse project in 2023. Helidon and Micronaut offer first-class support for EclipseStore.

MicroStream offers a beta version of MicroStream Enterprise as a commercial, supported version of EclipseStore. It also only serves a single JVM instance but adds indexing, automated lazy loading, and asynchronous writes. Serving multiple JVM instances, whether from one application or several, requires the commercial MicroStream Cluster, which is available on-premise and as SaaS. That product does not contain database features such as user management, logging, locking, or backups.

Kett used the annual cost and CO2 emissions of storing 1 TB of data with six copies at Amazon as an example when comparing costs. A six-node RDS PostgreSQL cluster costs $27,048 and emits 3,608 kg CO2. Six copies of S3 at the standard tier cost $1,827, 93% less, and emit 5.88 kg CO2, 99.84% less. However, the calculation did not consider a potentially higher memory usage of the Java applications. The license cost of MicroStream Cluster for shared storage across multiple JVM instances also did not factor into the calculation.

About the Author

Rate this Article

Adoption
Style

BT