BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Cassandra CLI Internals Using JArchitect

Posted by Dane Dennis on Dec 19, 2013 |

Relational Database Management Systems (RDBMS) are the most commonly used systems to store and use data, but for extremely large amounts of data, these databases don’t scale up well.

The concept of NoSQL has been gaining lot of popularity in the recent years due to the growing demand for relational database alternatives. The biggest motivation behind NoSQL is scalability. NoSQL database solutions offer a way to store and use extremely large amounts of data, but with less overhead, less work, better performance, and less downtime.

Apache Cassandra is a Column based NoSQL database. It was developed at Facebook to power their Inbox Search feature, and it became an Apache open source project. Twitter, Digg, Reddit and quite a few other organizations started using it.

Cassandra ships with a very basic interactive command line interface (CLI). Using the CLI you can connect to remote nodes in the cluster to create or update your schema and set and retrieve records.

The CLI is a useful tool for Cassandra administrators, and even if it provides only basic commands, it’s a good example to know how to implement a Cassandra client. We have to understand how the CLI works internally to develop our custom Cassandra clients or even extend the CLI tool.

In this article, we will explore Cassandra CLI architecture model using the JArchitect tool and the CQLinq language to analyze its code base. JArchitect tool is used to analyze code structure and specify design rules to achieve better code quality. With JArchitect, software quality can be measured using code metrics, visualized using graphs and treemaps, and enforced using standard and custom rules.

Here’s the dependency graph after analysis:

Cassandra uses some known jars like antlr, log4j, slf4j, commons-lang, and also some less known jars like the following:

  • Libthrift: it’s an API spanning a variety of programming languages and use cases. The goal is to make reliable, performant communication and data serialization across languages as efficient and seamless as possible.
  • Snakeyaml : YAML is a data serialization format designed for human readability and interaction with scripting languages. Cassandra us this format for its configuration files.
  • Jackson: A High-performance JSON processor.
  • Snappy: The snappy-java is a Java port of the snappy, a fast compresser/decompresser written in C++, originally developed by Google.
  • High-scale-lib: A collection of Concurrent and Highly Scalable Utilities. These are intended as direct replacements for the java.util.* or java.util.concurrent.* collections but with better performance when many CPUs are using the collection concurrently.

The Matrix view below gives us more details about the dependency weight between these JAR files.

Cassandra Command Line Interface

The command line interface logic is implemented in org.apache.cassandra.cli package, and the entry point is the CliMain class.

Let’s search for the methods invoked from the main method by using the following CQLinq query: 

from m in Methods where m.IsUsedBy ("org.apache.cassandra.cli.CliMain.main(String[])") 
select new { m, m.NbBCInstructions } 

The main method uses JLine which is a Java library for handling console input. It can be used to write nice CLI applications without much effort. It has out of the box support for Command History, Tab completion, Line editing, Custom Key Bindings, and Character masking.

And two interesting methods are used from the main method are:

  • connect: The connect method is used to connect to the Cassandra database server.
  • processStatetementInteractive: This method is used to execute commands from the user.

 

 

Communication between CLI and Cassandra Server

Before interacting with Cassandra server the client must connect to it using the connect method.

Let’s search for all methods used directly or indirectly by the connect method:

from m in Methods
let depth0 = m.DepthOfIsUsedBy("org.apache.cassandra.cli.CliMain.connect(String,int)")
where depth0 >= 0 orderby depth0
select new { m, depth0 }

The CLI communicates with the server using Thrift library which allows you to define the data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages. Instead of writing a lot of boilerplate code to serialize and transport your objects and invoke remote methods, you can get right down to business.

Here’s a simple example of an implementation of a Thrift server:

public class Server {
public static class SomethingHandler implements Something.Iface {
public SomethingHandler() {}
public int ping() {
return 1;
}
}
public static void main(String [] args) {
SomethingHandler handler = new SomethingHandler();
Something.Processor processor = new Something.Processor(handler);
TServerTransport serverTransport = new TServerSocket(9090);
TServer server = new TSimpleServer(processor, serverTransport);
//Or Use this for a multithreaded server
// server = new TThreadPoolServer(processor, serverTransport)
server.serve();
}
}

The Thrift server implements org.apache.thrift.server.TServer interface and the constructor of its implementation takes as parameters a processor and a server transport specification. The processor needs a handler to treat the incoming requests.

Let’s discover all these elements in the Cassandra server. For that we can begin by searching all classes that inherit from TServer class.

from t in Types
let depth0 = t.DepthOfDeriveFrom("org.apache.thrift.server.TServer")
where depth0 >= 0 orderby depth0
select new { t, depth0 }

Cassandra defines:

CustomTThreadPoolServer: It’s a slightly modified version of the Apache Thrift TThreadPoolServer which would use a thread pool to serve incoming requests.

CustomTHsHaServer: The goal of this server is to avoid sticking to one CPU for IO's. For better throughput it is spread across multiple threads. Number of selector threads can be the number of CPUs available.

CustomTNonBlockingServer: which uses a nonblocking socket transport.

And here’s what happens when the ThriftServer is started:

A factory is used to create a TServer and the CassandraSever handler is created to treat incoming requests, it implements Cassandra.Iface which contains all commands supported by Cassandra. Below diagram shows some of these methods:

 

As shown in the previous Thrift server example, we need the processors to process incoming requests; all these processors inherit from ProcessFunction.

Here are some Cassandra processors:

After discovering the Cassandra thrift server parts, let’s come back to the client and discover what happen when the connect method is invoked from the main method.

clip_image014

The org.apache.thrift.TServiceClient is used to communicate between the client and the server, and the method sendBase is invoked to send a message to the thrift server.

On the server, the login processor receives this request and invokes the login method.

And here’s the dependency graph showing some methods invoked from the login method.

Steps to extend the CLI by adding a new method MyMethod.

After discovering how the CLI works internally, we can easily add a new method to it, and here are the major steps needed to do it:

I – extending the server:

  • Add the method to Cassandra.Iface
  • Add the method implementation to the CassandraServer class
  • Add a new class Cassandra.Processor.MyMethod<l> inheriting from ProcessFunction<T>.
  • Add an instance of the new processor in the Map returned by the Cassandra.Processor<l>.getProcessMap method.

II- Extending the client:

  • Add a new switch and process it from CliOptions.processArgs method.
  • Add a method to the Cassandra.Client class and invoke the server by using the TServiceClient.sendBase method.

Conclusion

The command line interface is a good example to learn how to implement a Cassandra client, and learn from real projects is preferable than just search for samples in the web. So to develop a Cassandra client don’t hesitate to go inside its source code and enjoy.

About the Author

Dane Dennis is the JArchitect Product Manager. He works at CoderGears, a company developing tools for developers and architects.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT