InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Gremlin, a Language for Working with Graphs

Posted by Abel Avram on Jan 15, 2010

Sections
Development
Topics
Java ,
Architecture ,
Language

Gremlin is a Turing Complete programming language useful for working with graphs. It is a Java DSL that makes extensive use of XPath to query, analyze and manipulate graphs.

Gremlin can be used to create multi-relational graphs. Because the elements of the graph, vertices and edges, have properties defined as key-value pairs, the graph is called a property graph, and this is an example:

graph1

The language has the following types:

  • graph: a graph is composed of a set of vertices and a set of edges.
  • vertex: a vertex is composed of a set of outgoing edges, incoming edges, and a map of properties.
  • edge: an edge is composed of an outgoing vertex, incoming vertex, and a map of properties.
  • boolean: a boolean can either be true or false.
  • number: a number is a natural (integer) or real (double) number.
  • string: a string is an array of characters.
  • list: a list is an ordered collection of potentially duplicate objects.
  • map: a map is an associative array from a set of object keys to a collection of object values.

Besides XPath’s mathematical operations – addition, subtraction, multiplication, etc. - , Gremlin has a number of statements like If/Else, While, Repeat, Foreach, and others.

Gremlin can be used against any framework that implements the General Graph Model. This model consists of a number of components - graph, element, vertex, edge, and index - and their associated Java interfaces which need to be implemented in order to manipulate a graph using Gremlin constructs.

One example of using Gremlin is working with graphs saved as MongoDB documents. Another is working with Resource Description Framework stores like OpenRDF, AllegroGraph, Open Virtuoso or the Neo4j graph database. In the future, the Gremlin team intends to add support for CouchDB and Terracotta.

Example

To work with a JSON encoding of a graph one needs to use the following schemas for vertices and edges:

object {
string "_id";
array { string } inEdges;
array { string } outEdges;
object { }* properties;
};
object {
string "_id";
string label;
string inVertex;
string outVertex;
object { }* properties;
};

Then, the graph pictured above is encoded in JSON as following:

//// VERTEX COLLECTION ////
{
_id: "1",
properties: {
name : "marko",
age : 29 },
outEdges : ["7","8","9"]
}
{
_id: "2",
properties: {
name : "vadas",
age : 27
},
inEdges : ["7"]
} ... [section skipped for brevity]
//// EDGE COLLECTION ////
{
_id: "12",
label: "created",
properties: { weight : 0.2 },
outVertex : "6",
inVertex : "3"
}

This data is entered into MongoDB which creates two collections, one for vertices and one for edges, and Gremlin allows the manipulation of these collections through the General Graph Model interface: adding/removing vertices or edges, getting a list of vertices or edges, getting/setting properties on vertices or edges, and navigating the graph by finding the associated edges to a vertex or the vertices of an edge.

Adding two vertices to a graph and connecting them through an edge named “related_to” is done as following:

 gremlin> $v := g:add-v($g) ==>v[0]
gremlin> $u := g:add-v($g) ==>v[1]
gremlin> $e := g:add-e($g, $v, 'related_to', $u) ==>e[2][0-related_to->1]

Graphs have many possible applications in computer chip design, biology, networks, etc. One simple example is creating a graph where vertices are represented by a website’s pages and edges exist between pages containing links to other pages. Gremlin allows to navigate and modify such a graph of pages and their properties.

Resources: TinkerGraph – a reference implementation of the General Graph Model, Gremlin Documentation, Gremlin User Group.

No comments

Watch Thread Reply

Educational Content

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?