Neo4j: Java-based NoSQL Graph Database
After several years of development, the developers from NeoTechnology have released version 1.0 of Neo4j, a Java-based graph database which follows the property graph datamodel. InfoQ spoke with NeoTechnology COO Peter Neubauer to learn more about the current Neo4j release and what it offers to developers.
The Neo4j kernel JAR, which weighs in at around 440k, is available both under the open-source AGPLv3 and as a commercially licensed version, with a commercial license required if Neo4j is used in closed-source software. Information in Neo4j is represented using three basic building blocks:
- Node (a.k.a. vertex) - This is similar conceptually to an instance of an object, and it has a unique ID
- Relationship (a.k.a. edge) - This links together two
Nodes, and it has both a direction and a RelationshipType
- Property (a.k.a attribute) - These are
Objectvalue pairs which can exist on both
Compared to relational databases, graph databases tend to do better with high volumes of complex, interconnected low-structure data which change rapidly and require frequent querying - in a relational model these queries lead to large numbers of table joins, which can cause performance issues. Neubauer explained this in more detail, saying:
Neo4j addresses the problem of performance degradation over queries that involve many joins in a traditional RDBMS. By modeling the data around graphs, Neo4j can traverse along nodes and edges with the same speed, independently of the amount of data constituting the graph. This gives secondary effects like very fast graph algos, recommender systems and OLAP-style analytics that are currently not possible with normal RDBMS setups.
Since Neo4j is a database, each access to the graph structure - read, write, and traversal - are managed by an ACID transaction system. Graph traversal is handled through a Traverser API, indexing support via integration with Lucene is also provided, and an integration with Solr is in the works. A presentation by NeoTechnology CEO Emil Eifrem is also available which gives a more detailed introduction to Neo4j, as well as an interview with Peter Neubauer.
When asked about his stand on the NoSQL movement, Neubauer replied:
We are definitely part of the NoSQL movement in that we are trying to solve problems that RDBMS are not addressing right now. That said, our focus is first the complexity of data, deep queries and analytics - operations that require many joins and sparse tables in RDBMS, and second the scalability and sharding type of problems that many of the other NoSQL projects are focusing on.
Neubauer indicated that, although the 1.0 release was recent, Neo4j has been used in production for as long as 7 years in some areas and that the 1.0 was intended to indicate API stability rather than codebase stability. The performance of Neo4j was also touched upon, with Neo4j being capable of handling graphs with billions of objects in them without code modification, as well as normal performance of around 2 million relationship reads per second and shortest-path calculations which scale far better than with a relational database like MySQL (although as with all performance benchmarks, many factors including the underlying hardware and dataset used can cause dramatic changes in results).
- Extensions - jo4neo (Java Objects for Neo), Gremlin (a programming language for working with graphs), and a REST/JSON interface
- Framework integrations - Grails, Griffon, Django and Spring
- Language bindings - JRuby, Python, Scala, PHP and Clojure
graph database & network data model db?
Where is the big difference? They were also much faster than sql :)