Neo4j - an Embedded, Network Database - InfoQ

Neo4j is an embedded, high performance and lightweight persistence solution based on the network database model that has recently been gaining a lot of interest:

Neo is a netbase — a network-oriented database — that is, an embedded, disk-based, fully transactional Java persistence engine that stores data structured in networks rather than in tables. A network (or graph, in mathematical lingo) is a flexible data structure that allows a more agile and rapid style of development.

You can think of Neo as a high-performance graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.

What makes Neo interesting is the use of the so called "network oriented database". In this model, domain data is expressed in a "node space" - a network of nodes, relationships and properties (key value pairs), compared to the relational model's tables, rows & columns. Relationships are first class objects and may also be annotated with properties, revealing the context in which nodes interact. The network model is well suited to problem domains that are naturally hierarchically organized, for example Semantic Web applications. The creators of Neo found that hierarchical and semi structured data is not well suited to the the traditional relational database model:

The object relational impedance mismatch makes it unnecessarily difficult and time consuming to squeeze an object oriented “round object' into a relational “square table.”
The static, rigid and inflexible nature of the relational model makes it difficult to evolve schemas in order to meet changing business requirements. For the same reasons, the database often holds a team back when they try to apply agile software development methodologies by rapidly evolving the object oriented layer.
The relational model is exceptionally poor at capturing semi structured data, a type of information that industry analysts and researchers alike agree will be the next “big thing” in information management.
A network is a very efficient data storage structure. It's not a coincidence that the human brain is one huge network or that the world wide web is structured as an adhoc network. The relational model can capture networkoriented data, but it is very weak when it comes to traversing that network in order to extract information.

While Neo is a relatively new open source project, it has been used in production applications with over 100 million nodes, relationships and properties, satisfying enterprise robustness and performance requirements:

full support for JTA and JTS, 2PC distributed ACID transactions, configurable isolation levels and battle tested transaction recovery. These aren't just words: Neo has been in production for more than three years in a highly demanding 24/7 environment. It is mature, robust and ready to be deployed.

The Java API consists of 12 classes. Creating a node is straightforward:

Transaction tx = Transaction.begin();
EmbeddedNeo neo = ... // Get factory
// Create Thomas ’Neo’ Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( ”name”, ”Thomas Anderson” );
mrAnderson.setProperty( ”age”, 29 );
// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( ”name”, ”Morpheus” );
morpheus.setProperty( ”rank”, ”Captain” );
morpheus.setProperty( ”occupation”, ”Total bad ass” );
// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus,
MatrixRelationshipTypes.KNOWS );
// Create Trinity, Cypher, Agent Smith, Architect similarly
...
tx.commit();

Searching for nodes in the network is accomplished through a a "traverser" framework:

// Instantiate a traverser that returns all mrAnderson’s friends
Traverser friendsTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_NETWORK,
    ReturnableEvaluator.ALL_BUT_START_NODE,
    MatrixRelationshipTypes.KNOWS,
    Direction.OUTGOING);

Neo4j has a dual licensing model: free software (GPL Style) and commercial (although no pricing information is available on the web page). Currently at version 1.0 beta 6, the next release of Neo4j is expected to be Release Candidate 1. Ruby and Python wrappers for Neo4j are also under development.

InfoQ Software Architects' Newsletter

Neo4j - an Embedded, Network Database

Follow us on

Rate this Article

This content is in the Architecture topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter