Neo4j: Java-based NoSQL Graph Database

After several years of development, the developers from NeoTechnology have released version 1.0 of Neo4j, a Java-based graph database which follows the property graph datamodel. InfoQ spoke with NeoTechnology COO Peter Neubauer to learn more about the current Neo4j release and what it offers to developers.

The Neo4j kernel JAR, which weighs in at around 440k, is available both under the open-source AGPLv3 and as a commercially licensed version, with a commercial license required if Neo4j is used in closed-source software. Information in Neo4j is represented using three basic building blocks:

Node (a.k.a. vertex) - This is similar conceptually to an instance of an object, and it has a unique ID
Relationship (a.k.a. edge) - This links together two Nodes, and it has both a direction and a RelationshipType
Property (a.k.a attribute) - These are String key/Object value pairs which can exist on both Nodes and Relationships

Compared to relational databases, graph databases tend to do better with high volumes of complex, interconnected low-structure data which change rapidly and require frequent querying - in a relational model these queries lead to large numbers of table joins, which can cause performance issues. Neubauer explained this in more detail, saying:

Neo4j addresses the problem of performance degradation over queries that involve many joins in a traditional RDBMS. By modeling the data around graphs, Neo4j can traverse along nodes and edges with the same speed, independently of the amount of data constituting the graph. This gives secondary effects like very fast graph algos, recommender systems and OLAP-style analytics that are currently not possible with normal RDBMS setups.

Since Neo4j is a database, each access to the graph structure - read, write, and traversal - are managed by an ACID transaction system. Graph traversal is handled through a Traverser API, indexing support via integration with Lucene is also provided, and an integration with Solr is in the works. A presentation by NeoTechnology CEO Emil Eifrem is also available which gives a more detailed introduction to Neo4j, as well as an interview with Peter Neubauer.

When asked about his stand on the NoSQL movement, Neubauer replied:

We are definitely part of the NoSQL movement in that we are trying to solve problems that RDBMS are not addressing right now. That said, our focus is first the complexity of data, deep queries and analytics - operations that require many joins and sparse tables in RDBMS, and second the scalability and sharding type of problems that many of the other NoSQL projects are focusing on.

Neubauer indicated that, although the 1.0 release was recent, Neo4j has been used in production for as long as 7 years in some areas and that the 1.0 was intended to indicate API stability rather than codebase stability. The performance of Neo4j was also touched upon, with Neo4j being capable of handling graphs with billions of objects in them without code modification, as well as normal performance of around 2 million relationship reads per second and shortest-path calculations which scale far better than with a relational database like MySQL (although as with all performance benchmarks, many factors including the underlying hardware and dataset used can cause dramatic changes in results).

In addition to the main Neo4j codebase, there is a community of contributors and users and a larger ecosystem present, examples of which includes:

Extensions - jo4neo (Java Objects for Neo), Gremlin (a programming language for working with graphs), and a REST/JSON interface
Framework integrations - Grails, Griffon, Django and Spring
Language bindings - JRuby, Python, Scala, PHP and Clojure

With respect to future plans for Neo4j, a recent round of funding has helped to drive future development including the enhancing of the existing master/slave replication and online backup support to provide seamless high-availability with eventual consistency and write-master re-election, better overall operations support, and full REST support (including dynamic JavaScript-based traversal and read-only mode for data publishing). Longer-term plans include sharding support, which brings a new set of challenges to the Neo4j codebase - Emil Eifrem also indicated that a large and rapidly-growing community of users and developers which had created hundreds of Neo4j projects was important.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter