Neo4j - an Embedded, Network Database

| by Gavin Terrill Follow 1 Followers on Jun 05, 2008. Estimated reading time: 3 minutes |

Neo4j is an embedded, high performance and lightweight persistence solution based on the network database model that has recently been gaining a lot of interest:

Neo is a netbase — a network-oriented database — that is, an embedded, disk-based, fully transactional Java persistence engine that stores data structured in networks rather than in tables. A network (or graph, in mathematical lingo) is a flexible data structure that allows a more agile and rapid style of development.

You can think of Neo as a high-performance graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.

What makes Neo interesting is the use of the so called "network oriented database". In this model, domain data is expressed in a "node space" - a network of nodes, relationships and properties (key value pairs), compared to the relational model's tables, rows & columns. Relationships are first class objects and may also be annotated with properties, revealing the context in which nodes interact. The network model is well suited to problem domains that are naturally hierarchically organized, for example Semantic Web applications. The creators of Neo found that hierarchical and semi structured data is not well suited to the the traditional relational database model:

    1. The object relational impedance mismatch makes it unnecessarily difficult and time consuming to squeeze an object oriented “round object' into a relational “square table.”
    2. The static, rigid and inflexible nature of the relational model makes it difficult to evolve schemas in order to meet changing business requirements. For the same reasons, the database often holds a team back when they try to apply agile software development methodologies by rapidly evolving the object oriented layer.
    3. The relational model is exceptionally poor at capturing semi structured data, a type of information that industry analysts and researchers alike agree will be the next “big thing” in information management.
    4. A network is a very efficient data storage structure. It's not a coincidence that the human brain is one huge network or that the world wide web is structured as an adhoc network. The relational model can capture networkoriented data, but it is very weak when it comes to traversing that network in order to extract information.

While Neo is a relatively new open source project, it has been used in production applications with over 100 million nodes, relationships and properties, satisfying enterprise robustness and performance requirements:

full support for JTA and JTS, 2PC distributed ACID transactions, configurable isolation levels and battle tested transaction recovery. These aren't just words: Neo has been in production for more than three years in a highly demanding 24/7 environment. It is mature, robust and ready to be deployed.

The Java API consists of 12 classes. Creating a node is straightforward:

Transaction tx = Transaction.begin();
EmbeddedNeo neo = ... // Get factory
// Create Thomas ’Neo’ Anderson
Node mrAnderson = neo.createNode();
mrAnderson.setProperty( ”name”, ”Thomas Anderson” );
mrAnderson.setProperty( ”age”, 29 );
// Create Morpheus
Node morpheus = neo.createNode();
morpheus.setProperty( ”name”, ”Morpheus” );
morpheus.setProperty( ”rank”, ”Captain” );
morpheus.setProperty( ”occupation”, ”Total bad ass” );
// Create a relationship representing that they know each other
mrAnderson.createRelationshipTo( morpheus,
MatrixRelationshipTypes.KNOWS );
// Create Trinity, Cypher, Agent Smith, Architect similarly

Searching for nodes in the network is accomplished through a a "traverser" framework:

// Instantiate a traverser that returns all mrAnderson’s friends
Traverser friendsTraverser = mrAnderson.traverse(

Neo4j has a dual licensing model: free software (GPL Style) and commercial (although no pricing information is available on the web page). Currently at version 1.0 beta 6, the next release of Neo4j is expected to be Release Candidate 1. Ruby and Python wrappers for Neo4j are also under development.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Neo4j - an Embedded, Network Database by Milan Vasic

What is the different between JCR and Neo4j?

On documentation and licensing by Emil Eifrem


I'm one of the founders of the community and Neo Technology -- the commercial entity backing it. Thanks for the post! We're a bit behind on our documentation (as usual), but it's great to see that people can still pick up the concepts as well as the code! We have one person working full time this summer just to improve community documentation, so if you start playing around with Neo4j then feel very welcome to join the mailing list and give feedback on what's good/bad/missing/etc with the documentation. It will be much appreciated!

Before we regress into licensing flame wars -- Neo4j is dual licensed with a viral open source license, much like for example Db4o and BerkeleyDB. We're using the Free Software Foundation's AGPLv3, which very simply put is an "extension" of the GPL that closes the so-called "ASP loophole." (For more info, google it or drop me a mail privately.)

I know there are many views on this, both legal-technical as well as ethical, and they flounder when a bunch of non-lawyers (like myself!) debate it. Suffice to say, our intent with Neo's licensing is this:

  • if you write free or open source software: great, we want you to
    be able to use Neo gratis and under a free software license,

  • if you write proprietary software: you're not unlikely to be
    making money off of it, and then we think it's fair to ask you to
    purchase a commercial license.

It's not perfect, but we believe it's a good approximation of fair and
ethical as well as commercially viable. YMMV.

As for our corporate web site, it is scheduled to be released with full information on June 30. You guys caught us a bit off guard! :)


Re: Neo4j - an Embedded, Network Database by Emil Eifrem


I'm not an expert on JCR by far, but I believe that JCR's core abstraction is a hierarchical view (i.e. tree) of the world whereas Neo sees the world as a graph/network. So that's at least one fundamental difference that would make a big impact on how you model your domain.

But I'm sure others with more expert opinions can chime in?


Re: On documentation and licensing by Neil Ellis

I don't think you need to be too defensive Emil; I think the open source community is starting to get to grips with the need for open source vendors to have the dual license model to protect against aggressive OEMing by closed source vendors - I for one think you've made a big enough leap going GPL and wish you great success. I am aware of but haven't (yet) used your product and think it's a great idea. Certainly there seems to be an ever growing need for associative data in highly distributed systems; the constant increase in the number and types of embedded devices (never mind RFID) is undoubtedly going to fuel the need for graph based data and ultimately the relational databases just won't scale to massive numbers of data sources with highly volatile data.

I think graph based data and temporal queries (like CEP) are going to be two of the biggest technologies of the next decade as we go for massively distributed event based systems to deal with the ever increasing volumes and volatility of information.

All the best


Comparison to RDF stores by Jonny Wray


I appreciate the information about Neo, it meshes very well with a project I'm currently working on. Do you have anything to say about how this fits in with RDF datastores which are also a graph/network based model, comparisons between the approaches etc.

I'm currently using Sesame for my datastore and API and while it provides a support for a lot of the constructs I need there isn't anything like the traversal API of Neo, and such an API would be very useful for some usecases we have in mind.

Conversely, there are obviously a number of features supported by sesame by the fact it is RDF based that aren't supported in Neo.


Re: On documentation and licensing by Emil Eifrem

I think graph based data and temporal queries (like CEP) are going to be two of the biggest technologies of the next decade

I couldn't agree more. Interesting connection with regards to how embedded devices will drive associative data, hadn't really thought about it that way before.

In addition to associativity and temporality, I believe another big emerging trend for the next decade is that of data semi-structure. We're entering an age now when anyone can add any tag to any photo, where any user can attach RDF meta data to any resource on the web without consulting a central authority. As the content creation process is becoming increasingly decentralized, every content item is getting a potentially unique schema. I think we'll be hard pressed to squeeze that into tables.


Re: Comparison to RDF stores by Emil Eifrem

Do you have anything to say about how this fits in with RDF datastores which are also a graph/network based model

Hi Jonny,

Neo's data model actually maps very well to RDF. We're a graph, after all. In fact, we already have some components for exposing Neo as an RDF store, for example through a SAIL API and a SPARQL endpoint. Now, be warned: these components aren't yet productified and documented. But we have two commercial customers using them already to dump RDF/XML-formatted data into Neo and executing SPARQL queries to get it out.

We expect them to be productified and documented in early fall, but I think they're usable now with some patience. Feel free to join the mailing list and we'll be glad to get you up and running!


Re: On documentation and licensing by Neil Ellis

As the content creation process is becoming increasingly decentralized, every content item is getting a potentially unique schema

Hi again Emil, so how does Neo provide controls and limits to the data added, is there an equivalent to schemas optionally available, or is their a different way to keep clean data when required?



Re: On documentation and licensing by Rickard Öberg

There is a plugin to Qi4j ( so that you can use Neo4j as the datastore for your entities. You would then model your domain using Qi4j (i.e. Java), and that would map more or less automatically to a Neo4j database. This gives a good tradeoff between the need for stability while still being able to use the dynamicity of Neo4j.

Re: On documentation and licensing by Peter Neubauer

Hi there,
there is the Neo Meta model, which even is the baes for the RDF capabilities being modelled on top of it. See here for pointers and details.



Multi-process? by Peter Monks

Does Neo4J support multiple independent JVMs (potentially on different machines) all accessing and manipulating the same graph concurrently? If so, how does it handle cross-cluster communication (for distributed locking, transactional semantics, cache management, etc.)?

Re: On documentation and licensing by Neil Ellis

Thanks Peter at first glance that looks pretty spot on.

Rickard, I can see why you would want to integrate Qi4j with Neo4j, the traversing mentality works much better than set queries for quite a few problem domains doesn't it. It's funny how a particular technology (RDMS and SQL especially) trains your mind to think in a particular way, so we get various OO style SQL dialects rather than graph traversal languages. It's certainly piqued my interest; now if you had a single query language which supported graph traversal and set queries....(Cue: response from someone who knows one :-) )

I tell you InfoQ does seem to pick some decent technologies to highlight.

Re: On documentation and licensing by Rickard Öberg

Neil, the "native" query language in Qi4j is going to be SPARQL, which I think is decent for both graph and set queries.

Re: On documentation and licensing by Peter Neubauer

when it comes to RDF basd data and "graphy" traversal of that, there are some interesting, yet RDF centric approaches, e.g. Ripple, coming from here. However, at Neo4j the view is that OO graph manipulation, representation and traversal is more powerful, type safe and convenient for a developer than RDF based programming even if there are very interesting approaches around.



Re: On documentation and licensing by Neil Ellis

I was looking at SPARQL myself actually, very interesting.

Re: On documentation and licensing by Neil Ellis

Hi Peter

Cool links btw, much appreciated, a little more research for me to do.

However, at Neo4j the view is that OO graph manipulation, representation and traversal is more powerful, type safe and convenient for a developer than RDF based programming

Indeed, however the value of DSLs like SQL and so forth is the queries are easier to comprehend and can be highly succinct. A fantastic example of this of course is Esper whose query language is, well, bloody amazing to be honest. So in Einstein I can write code like:

listen to "time(schedule=cron):0/1 * * * * ?" {

execute "java:org.cauldron.einstein.ri.examples.esper.EventMaker";

listen payload to "esper:select avg(value) from sec)" {
extract "xpath:/underlying" >> "console:Average value is: ";


(Which basically causes a widget with a value to be emitted every second and passed to Esper). With the Esper query giving me a running total over 30 seconds, I think it's pretty easy to follow what Esper is doing here.

However I agree there are times when you want to be as safe as humanly possible and have finer control.

All the best, Neil

Re: Multi-process? by Emil Eifrem

Hi Peter,

We're working on a distributed Neo4j, which will support partitioning the graph onto multiple JVMs (so potentially on multiple machines). But this is not our top priority right now and it won't have production quality until 2.0. Our main focus atm is getting a kickass 1.0-final out the door. But we have several commercial customers who will need this in the 1-2 year time frame so it's definitely a direction where we will go.

It may be interesting to know that Neo4j can be deployed in the distributed OSGi runtime Newton as well as in its commercial counterpart Infiniflow. They are excellent environments for distributed apps and work well with Neo4j today.


Re: Multi-process? by Neil Ellis

Indeed Einstein is also being built to work seamlessly with Newton/Infiniflow I came across this project/product a year ago, they were still a bit ahead of the market back then, now it seems the market has very much caught up. It's a good quick way of turning a non distributed app into a distributed app quickly - you'll see what I mean if you try it.

Difference with OO Databases? by Gabriel Kastenbaum

Can Neo4j be called the Object Oriented database?

Re: Difference with OO Databases? by Emil Eifrem

Hi Gabriel,

Neo4j is not an object-oriented database and not about transparent persistence tied to a specific OO environment. We believe in separating data and logic much like the brains behind the relational model did. But unlike them, we think that a graph model is a better "natural" representation of most domains than relational sets ("tables") are.

We think that it's a LOT easier to map OO abstractions to a graph model though. I think this is because the OO abstractions are typically modeled after the real world (at least the entity abstractions) and many domains seem to actually be graphs.

Since we don't couple the persistent state of an application (i.e. its data) to the current OO implementation of it, the data layer becomes much more independent from the business layer. I think this is a sound architecture. The fact that we're not tied to a specific OO environment also allows us to integrate very well with systems that aren't strictly OO like Qi4j, Python and Ruby.


Re: Difference with OO Databases? by Emil Eifrem

than relational sets ("tables")

Actually, make that "relations" rather than "sets."


Re: Multi-process? by Peter Monks

What about fault tolerance / high availability? Partitioning, while good for scalability, doesn't help with those concerns.

Re: Multi-process? by Emil Eifrem

Hi Peter,

In our current (very prototype) distribution code line, you can declaratively assign segments of the node space to different machines. Every node space segment has a "master," which "owns" that segment (though it may be replicated on multiple other machines). For an HA scenario with two machines, you would designate one machine the owner of the ENTIRE node space and then writes would be synchronously or asynchronously cascaded to the slave, while reads would be dispatched to either one.

So HA is a functional subset of the fully distributed Neo 2.0 kernel. Our current roadmap has that HA functionality as part of the 2.0 release. We've had tentative discussions with some customers about pushing it ahead a bit (i.e. only HA bit as a standalone part before the 2.0 kernel), but nothing final. But if you need it for a commercial project, feel free to drop me a note and we can discuss if there's anything we can do.



Re: Comparison to RDF stores by Oleg Aravin

Can you give any example code how to use SPARQL query at Neo?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

24 Discuss