Collaboration: At the Extremities of Extreme
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Michael Hunger on Feb 25, 2010
After several years of development, the developers from NeoTechnology have released version 1.0 of Neo4j, a Java-based graph database which follows the property graph datamodel. InfoQ spoke with NeoTechnology COO Peter Neubauer to learn more about the current Neo4j release and what it offers to developers.
The Neo4j kernel JAR, which weighs in at around 440k, is available both under the open-source AGPLv3 and as a commercially licensed version, with a commercial license required if Neo4j is used in closed-source software. Information in Neo4j is represented using three basic building blocks:
Nodes, and it has both a direction and a RelationshipTypeString key/Object value pairs which can exist on both Nodes and RelationshipsCompared to relational databases, graph databases tend to do better with high volumes of complex, interconnected low-structure data which change rapidly and require frequent querying - in a relational model these queries lead to large numbers of table joins, which can cause performance issues. Neubauer explained this in more detail, saying:
Neo4j addresses the problem of performance degradation over queries that involve many joins in a traditional RDBMS. By modeling the data around graphs, Neo4j can traverse along nodes and edges with the same speed, independently of the amount of data constituting the graph. This gives secondary effects like very fast graph algos, recommender systems and OLAP-style analytics that are currently not possible with normal RDBMS setups.
Since Neo4j is a database, each access to the graph structure - read, write, and traversal - are managed by an ACID transaction system. Graph traversal is handled through a Traverser API, indexing support via integration with Lucene is also provided, and an integration with Solr is in the works. A presentation by NeoTechnology CEO Emil Eifrem is also available which gives a more detailed introduction to Neo4j, as well as an interview with Peter Neubauer.
When asked about his stand on the NoSQL movement, Neubauer replied:
We are definitely part of the NoSQL movement in that we are trying to solve problems that RDBMS are not addressing right now. That said, our focus is first the complexity of data, deep queries and analytics - operations that require many joins and sparse tables in RDBMS, and second the scalability and sharding type of problems that many of the other NoSQL projects are focusing on.
Neubauer indicated that, although the 1.0 release was recent, Neo4j has been used in production for as long as 7 years in some areas and that the 1.0 was intended to indicate API stability rather than codebase stability. The performance of Neo4j was also touched upon, with Neo4j being capable of handling graphs with billions of objects in them without code modification, as well as normal performance of around 2 million relationship reads per second and shortest-path calculations which scale far better than with a relational database like MySQL (although as with all performance benchmarks, many factors including the underlying hardware and dataset used can cause dramatic changes in results).
In addition to the main Neo4j codebase, there is a community of contributors and users and a larger ecosystem present, examples of which includes:
With respect to future plans for Neo4j, a recent round of funding has helped to drive future development including the enhancing of the existing master/slave replication and online backup support to provide seamless high-availability with eventual consistency and write-master re-election, better overall operations support, and full REST support (including dynamic JavaScript-based traversal and read-only mode for data publishing). Longer-term plans include sharding support, which brings a new set of challenges to the Neo4j codebase - Emil Eifrem also indicated that a large and rapidly-growing community of users and developers which had created hundreds of Neo4j projects was important.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.
John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.
2 comments
Watch Thread Reply