BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Cassandra Mythology

Posted by Jonathan Ellis on Jun 26, 2013 |

Like the prophetess of Troy it was named for, Apache Cassandra has seen some myths accrue around it. Like most myths, these were once at least partly true, but have become outdated as Cassandra evolved and improved. In this article, I'll discuss five common areas of concern and clarify the confusion.

Myth: Cassandra is a map of maps

As applications using Cassandra became more complex, it became clear that schema and data typing make development and maintenance much easier at scale than "everything is a bytebuffer," or "everything is a string."

Today, the best way to think of Cassandra's data models is as tables and rows. Similar to a relational database, Cassandra columns are strongly typed and indexable.

Other things you might have heard:

  • "Cassandra is a column database." Column databases store all values for a given column together on disk. This makes them suitable for data warehouse workloads, but not for running applications that require fast access to specific rows.
  • "Cassandra is a wide-row database." There is a grain of truth here, which is that Cassandra's storage engine is inspired by Bigtable, the grandfather of wide-row databases. But wide-row databases tie their data model too closely to that storage engine, which is easier to implement but more difficult to develop against, and prevents many optimizations.

One of the reasons we shied away from "tables and rows" to start with is that Cassandra tables do have some subtle differences from the relational ones you're familiar with. First, the first element of a primary key is the partition key. Rows in the same partition will be owned by the same servers, and partitions are clustered.

Second, Cassandra does not support joins or subqueries, because joins across machines in a distributed system are not performant. Instead, Cassandra encourages denormalization to get the data you need from a single table, and provides tools like collections to make this easier.

For example, consider the users table shown in the following code example:

CREATE TABLE users (
user_id uuid PRIMARY KEY,
name text,
state text,
birth_year int
);

Most modern services understand now that users have multiple email addresses. In the relational world, we'd add a many-to-one relationship and correlate addresses to users with a join, like the following example:

CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;

In Cassandra, we'd denormalize by adding the email addresses directly to the users table. A set collection is perfect for this job:

ALTER TABLE users ADD email_addresses set<text>;

We can then add addresses to a user record like this:

UPDATE users
SET email_addresses = {‘jbe@gmail.com’, ‘jbe@datastax.com’}
WHERE user_id = ‘73844cd1-c16e-11e2-8bbd-7cd1c3f676e3’

See the documentation for more on the Cassandra data model, including self-expiring data and distributed counters.

Myth: Cassandra is slow at reads

While Cassandra's log-structured storage engine means that it does not seek for updates on hard disks or cause write amplification on solid-state disks, it is also fast at reads.

Here are the throughput numbers from the random-access read, random-access and sequential-scan, and mixed read/write workloads from the University of Toronto's NoSQL benchmark results:

The Endpoint benchmark comparing Cassandra, HBase, and MongDB corroborates these results.

How does this work? At a very high level, Cassandra's storage engine looks similar to Bigtable, and uses some of the same terminology. Updates are appended to a commitlog, then collected into a "memtable" that is eventually flushed to disk and indexed, as an "sstable:"

Naive log-structured storage engines do tend to be slower at reads, for the same reason they are fast at writes: new values in rows do not overwrite the old ones in-place, but must be merged in the background by compaction. So in the worst case, you will have to check multiple sstables to retrieve all the columns for a "fragmented" row:

Cassandra makes several improvements to this basic design to achieve good read performance:

Myth: Cassandra is hard to run

There are three aspects to running a distributed system that tend to be more complicated than running a single-machine database:

  1. Initial deployment and configuration
  2. Routine maintenance such as upgrading, adding new nodes, or replacing failed ones
  3. Troubleshooting

Cassandra is a fully distributed system: every machine in a Cassandra cluster has the same role. There are no metadata servers that have to fit everything in memory. There are no configuration servers to replicate. There are no masters, and no failover. This makes every aspect of running Cassandra simpler than the alternatives. It also means that bringing up a single-node cluster to develop and test against is trivial, and behaves exactly the way a full cluster of dozens of nodes would.

Initial deployment is the least important concern here in one sense: other things being equal, even a relatively complex initial setup will be insignificant when amortized over the lifetime of the system, and automated installation tools can hide most of the gory details. But! If you can barely understand a system well enough to install it manually, there's going to be trouble when you need to troubleshoot a problem, which requires much more intimate knowledge of how all the pieces fit together.

Thus, my advice would be to make sure you understand what is going on during installation, as in this two-minute example of setting up a Cassandra cluster, before relying on tools like the Windows MSI installer, OpsCenter provisioning or the self-configuring AMI.

Cassandra makes routine maintenance easy. Upgrades can be done one node at a time. While a node is down, other nodes will save updates that it missed and forward them when it comes back up. Adding new nodes is parallelized across the cluster; there is no need to rebalance afterwards.

Even dealing with longer, unplanned outages is straightforward. Cassandra's active repair is like rsync for databases, only transferring missing data and keeping network traffic minimal. You might not even notice anything happened if you're not paying close attention.

Cassandra's industry leading support for multiple datacenters even makes it straightforward to survive an entire AWS region going down or losing a datacenter to a hurricane.

Finally, DataStax OpsCenter simplifies troubleshooting by making the most important metrics in your cluster available at a glance, allowing you to easily correlate historical activity with the events causing service degradation. The DataStax Community Edition Cassandra distribution includes a "lite" version of OpsCenter, free for production use. DataStax Enterprise also includes scheduled backup and restore, configurable alerts, and more.

Myth: Cassandra is hard to develop against

The original Cassandra Thrift API achieved its goal of giving us a cross-platform base for a minimum of effort, but the result was admittedly difficult to work with. CQL, Cassandra's SQL dialect, replaces that with an easier interface, a gentler learning curve, and an asynchronous protocol.

CQL has been available for early adopters beginning with version 0.8 two years ago; with the release of version 1.2 in January, CQL is production ready, with many drivers available and better performance than Thrift. DataStax is also officially supporting the most popular CQL drivers, which helps avoid the sometimes indifferent support seen with the community Thrift drivers.

Patrick McFadin's Next Top Data Model presentations (one, two) are a good introduction to CQL beyond the basics in the documentation.

Myth: Cassandra is still bleeding edge

From an open source perspective, Apache Cassandra is now almost five years old and has many releases under its belt, with version 2.0 coming up in July. From an enterprise point of view, DataStax provides DataStax Enterprise, which includes a certified version of Cassandra that has been specifically tested, benchmarked, and approved for production environments.

Businesses have seen the value that Cassandra brings to their organizations. Over 20 of the Fortune 100 rely on Cassandra to power their mission-critical applications in nearly every industry, including financial, health care, retail, entertainment, online advertising and marketing.

The most common reason to move to Cassandra is when the existing technology can't scale to the demands of modern big data applications. Netflix, the largest cloud application in the world, moved 95% of their data from Oracle to Cassandra. Barracuda Networks replaced MySQL with Cassandra when MySQL couldn't handle the volume of requests needed to combat modern spammers. Ooyala handles two billion data points every day, on a Cassandra deployment of more than two petabytes.

Cassandra is also augmenting or replacing legacy relational databases that have proven either too costly to manage and maintain. Constant Contact's initial project with Cassandra took three months and $250,000, compared to nine months and $2,500,000 on their traditional RDBMS. Today they have six clusters and more than 100TB of data trusted to Cassandra.

Many other examples can be found in DataStax's case studies and Planet Cassandra’s user interviews.

Not a myth: the 2013 Cassandra Summit in San Francisco

We just completed the best way to learn more about Cassandra, with over 1100 attendees and 65 sessions from Accenture, Barracuda Networks, Blue Mountain Capital, Comcast, Constant Contact, eBay, Fusion-io, Intuit, Netflix, Sony, Splunk, Spotify, Walmart, and more. Slides are already up; follow Planet Cassandra to be notified when videos are available.
 

About the Author

Jonathan Ellis is CTO and co-founder at DataStax. Prior to DataStax, he worked extensively with Apache Cassandra while employed at Rackspace. Prior to Rackspace, Jonathan built a multi-petabyte, scalable storage system based on Reed-Solomon encoding for backup provider Mozy.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Software diversity is key by Arturo Hernandez

It does look like Cassandra is one of the most evolved players in NoSQL. Personally I don't see Cassandra or NoSQL as replacements for SQL databases. They are just good at different things, so the choice really depends on the application.

My personal interpretation is, all things being equal, SQL databases are easier to use and more reliable. Here is the inequality, SQL databases don't scale well to BigData. Unless you are _sharing_ infrastructure it does not make sense to use NoSQL below 100GB just to give a number. Unless the development team knows NoSQL inside-out and has little experience with SQL databases.

My bias opinion by peter lin

Having contributed to hector and built a temporal database on top of Cassandra, my opinion is the combination of CQL and Thrift is very power. Some concepts from SQL simply don't translate well and negate many of the benefits of Cassandra. To get the most out of cassandra, people really should use both thrift and CQL.

Other noSql databases haven't progressed as far, even though some of them predate Cassandra. Kudos to the cassandra team for the huge progress over the last 2 years.

Comparison with MongoDb? by Steve Macdonald

Great article, Jonathan. Are you aware of any whitepapers or web articles that do a deep dive into the differences between Cassandra and MongoDb?

Re: Software diversity is key by Jonathan Ellis

There's some truth to that, which I've phrased as, "if your data fits on a single machine, it doesn't matter what database you run."

However, we're seeing a healthy minority of Cassandra users come to us not because of their scaling needs, but because Cassandra's replication is similarly worlds better than you get with relational databases or master/slave-based nosql like HBase or MongoDB. Fully distributed replication with clusters spanning multiple datacenters is a powerful tool. You can read more about this at www.datastax.com/dev/blog/multi-datacenter-repl... and some more low level detail from the recent Summit at www.slideshare.net/planetcassandra/6-jason-brown.

Re: Comparison with MongoDb? by Jonathan Ellis

Apple and Oranges - Throughput v Latency by Martin Anderson

Myth Two - Cassandra is slow at reads, is still very much true. To justify your claim you used throughput as a metric but this makes no sense since speed is measured by latency whereas throughput describes the scalable nature of a system. That Cassandra does not do fast reads is even supported by the study that you cite (figures 4 & 7) where Cassandra's read latency is 5-8ms versus sub-millisecond latencies for other NoSQL solutions. What is worse is that the paper only presented average latencies and to fit both Cassandra and HBase in they had to use a logarithmic axis! From personal experience the 95th/99th percentile for Cassandra can be substantially worse especially with stronger consistency requirements.

Now I know that this all depends on how you define fast. Ten milliseconds may be blazingly fast for your particular performance envelope, e.g. off-line reads, but there are many cases where it is just not fast enough.

Cassandra is a fantastic system for the correct use case but pretending that it is fast in comparison to other NoSQL solutions is plain wrong and if you were fair you would edit your otherwise excellent article to reflect this.

Re: Apple and Oranges - Throughput v Latency by Jonathan Ellis

Hi Martin,

You're right that latency is a valid measure of speed, but throughput is as well. For this category of software, ops/s (throughput) is generally more important than latency, as long as you meet a certain minimum latency bar; this is what Cassandra optimizes for.

I'd also note that while Voldemort (alone of the nosql systems that scaled reliably) did beat Cassandra on latency, Cassandra in turn beat HBase by an order of magnitude, as well as MongoDB in the Endpoint benchmark.

So while I agree that there's room to improve, I stand by my interpretation.

Re: Apple and Oranges - Throughput v Latency by Martin Anderson

> You're right that latency is a valid measure of speed, but throughput is as well

If they are both valid measurements of speed why did you not include the latency figures for completeness along with the 3 you posted concerning throughput? How is that not selective reporting?

If you are noting the comparison with other NoSQL solutions, why did you not also include the Altoros benchmark that showed that Couchbase showed lower latency, higher max throughput then Couchbase and still while supporting strong consistency? How is this not selective reporting either?

I like Cassandra. I think it is a superb solution for write heavy BigData situations for columnar data where you don't need strong consistency and do not have a low latency requirement. The latest versions are stable and reliable and it's a great fit for many areas in enterprise but speed as judged by the latency will remain an issue for some use cases.

Re: Apple and Oranges - Throughput v Latency by peter lin

If latency is critical, why not use a different tool for the job? Something like an in-memory database or distributed cache? From my own experience building trading systems, when latency for a particular type of data matters, cache it in memory. With some data grid products, you can build a large multi-terabyte database with very low latencies.

Error adding a set column type by Arthur Zubarev

In Cassandra 1.2.4 trying to add the set collection column results in error:
Bad Request: Cannot use collection types with non-composite PRIMARY KEY
But the table in your example does not have a composite PK.

Re: Apple and Oranges - Throughput v Latency by Martin Anderson

You are answering a different question - should Cassandra be used for a high performance cache similar to Coherence? The answer to that is obviously no but that was not the question we were addressing!

We were addressing the assertion "Cassandra is slow at reads" made in the article. The reality is that compared to other similar solutions, yes it is comparatively slow which renders it unsuitable for numerous situations where online queries are performed in-process. There is also the fact that the paper used the average for latency which is a terrible metric to use. P95 is far more useful.

Cassandra's saving grace is that is is almost linearly scalable unlike many other solutions, but again, that is not the assertion being questioned.

benchmark by Juan Pi.

Hello.
How does Cassandra compare (performance and latency) against Aerospike, Hyperdex, FoundationDB or MonetDB?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

12 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT