InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Neo4j: Java-based NoSQL Graph Database

Posted by Michael Hunger on Feb 25, 2010

Sections
Architecture & Design,
Development,
Operations & Infrastructure
Topics
Database Design ,
Java ,
NoSQL
Tags
Graph Database ,
ACID ,
Lucene ,
Neo4j ,
Gremlin ,
Apache Solr

After several years of development, the developers from NeoTechnology have released version 1.0 of Neo4j, a Java-based graph database which follows the property graph datamodel. InfoQ spoke with NeoTechnology COO Peter Neubauer to learn more about the current Neo4j release and what it offers to developers.

The Neo4j kernel JAR, which weighs in at around 440k, is available both under the open-source AGPLv3 and as a commercially licensed version, with a commercial license required if Neo4j is used in closed-source software. Information in Neo4j is represented using three basic building blocks:

  • Node (a.k.a. vertex) - This is similar conceptually to an instance of an object, and it has a unique ID
  • Relationship (a.k.a. edge) - This links together two Nodes, and it has both a direction and a RelationshipType
  • Property (a.k.a attribute) - These are String key/Object value pairs which can exist on both Nodes and Relationships

Compared to relational databases, graph databases tend to do better with high volumes of complex, interconnected low-structure data which change rapidly and require frequent querying - in a relational model these queries lead to large numbers of table joins, which can cause performance issues. Neubauer explained this in more detail, saying:

Neo4j addresses the problem of performance degradation over queries that involve many joins in a traditional RDBMS. By modeling the data around graphs, Neo4j can traverse along nodes and edges with the same speed, independently of the amount of data constituting the graph. This gives secondary effects like very fast graph algos, recommender systems and OLAP-style analytics that are currently not possible with normal RDBMS setups.

Since Neo4j is a database, each access to the graph structure - read, write, and traversal - are managed by an ACID transaction system. Graph traversal is handled through a Traverser API, indexing support via integration with Lucene is also provided, and an integration with Solr is in the works. A presentation by NeoTechnology CEO Emil Eifrem is also available which gives a more detailed introduction to Neo4j, as well as an interview with Peter Neubauer.

When asked about his stand on the NoSQL movement, Neubauer replied:

We are definitely part of the NoSQL movement in that we are trying to solve problems that RDBMS are not addressing right now. That said, our focus is first the complexity of data, deep queries and analytics - operations that require many joins and sparse tables in RDBMS, and second the scalability and sharding type of problems that many of the other NoSQL projects are focusing on.

Neubauer indicated that, although the 1.0 release was recent, Neo4j has been used in production for as long as 7 years in some areas and that the 1.0 was intended to indicate API stability rather than codebase stability. The performance of Neo4j was also touched upon, with Neo4j being capable of handling graphs with billions of objects in them without code modification, as well as normal performance of around 2 million relationship reads per second and shortest-path calculations which scale far better than with a relational database like MySQL (although as with all performance benchmarks, many factors including the underlying hardware and dataset used can cause dramatic changes in results).

In addition to the main Neo4j codebase, there is a community of contributors and users and a larger ecosystem present, examples of which includes:

With respect to future plans for Neo4j, a recent round of funding has helped to drive future development including the enhancing of the existing master/slave replication and online backup support to provide seamless high-availability with eventual consistency and write-master re-election, better overall operations support, and full REST support (including dynamic JavaScript-based traversal and read-only mode for data publishing). Longer-term plans include sharding support, which brings a new set of challenges to the Neo4j codebase - Emil Eifrem also indicated that a large and rapidly-growing community of users and developers which had created hundreds of Neo4j projects was important.

  • This article is part of a featured topic series on NoSQL

Related Sponsor

Neo4j is a robust, high-performance, scalable graph database. It is the only NOSQL database that solves the complex, connected data challenges that enterprises face today.

graph database & network data model db? by Douwe Vonk Posted
How is this different than OO DBs? by Adrian A. Posted
  1. Back to top

    graph database & network data model db?

    by Douwe Vonk

    It does a lot look like the old (1970/80s) network data model databases - traversing record sets via pointers.
    Where is the big difference? They were also much faster than sql :)

  2. Back to top

    How is this different than OO DBs?

    by Adrian A.

    How is this different (or better) than established OO DBs like db4o, neodatis, etc?

Educational Content

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.