BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Cassandra Mythology

Cassandra Mythology

Lire ce contenu en français

Bookmarks

Like the prophetess of Troy it was named for, Apache Cassandra has seen some myths accrue around it. Like most myths, these were once at least partly true, but have become outdated as Cassandra evolved and improved. In this article, I'll discuss five common areas of concern and clarify the confusion.

Myth: Cassandra is a map of maps

As applications using Cassandra became more complex, it became clear that schema and data typing make development and maintenance much easier at scale than "everything is a bytebuffer," or "everything is a string."

Today, the best way to think of Cassandra's data models is as tables and rows. Similar to a relational database, Cassandra columns are strongly typed and indexable.

Other things you might have heard:

  • "Cassandra is a column database." Column databases store all values for a given column together on disk. This makes them suitable for data warehouse workloads, but not for running applications that require fast access to specific rows.
  • "Cassandra is a wide-row database." There is a grain of truth here, which is that Cassandra's storage engine is inspired by Bigtable, the grandfather of wide-row databases. But wide-row databases tie their data model too closely to that storage engine, which is easier to implement but more difficult to develop against, and prevents many optimizations.

One of the reasons we shied away from "tables and rows" to start with is that Cassandra tables do have some subtle differences from the relational ones you're familiar with. First, the first element of a primary key is the partition key. Rows in the same partition will be owned by the same servers, and partitions are clustered.

Second, Cassandra does not support joins or subqueries, because joins across machines in a distributed system are not performant. Instead, Cassandra encourages denormalization to get the data you need from a single table, and provides tools like collections to make this easier.

For example, consider the users table shown in the following code example:

CREATE TABLE users (
user_id uuid PRIMARY KEY,
name text,
state text,
birth_year int
);

Most modern services understand now that users have multiple email addresses. In the relational world, we'd add a many-to-one relationship and correlate addresses to users with a join, like the following example:

CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;

In Cassandra, we'd denormalize by adding the email addresses directly to the users table. A set collection is perfect for this job:

ALTER TABLE users ADD email_addresses set<text>;

We can then add addresses to a user record like this:

UPDATE users
SET email_addresses = {‘jbe@gmail.com’, ‘jbe@datastax.com’}
WHERE user_id = ‘73844cd1-c16e-11e2-8bbd-7cd1c3f676e3’

See the documentation for more on the Cassandra data model, including self-expiring data and distributed counters.

Myth: Cassandra is slow at reads

While Cassandra's log-structured storage engine means that it does not seek for updates on hard disks or cause write amplification on solid-state disks, it is also fast at reads.

Here are the throughput numbers from the random-access read, random-access and sequential-scan, and mixed read/write workloads from the University of Toronto's NoSQL benchmark results:

The Endpoint benchmark comparing Cassandra, HBase, and MongDB corroborates these results.

How does this work? At a very high level, Cassandra's storage engine looks similar to Bigtable, and uses some of the same terminology. Updates are appended to a commitlog, then collected into a "memtable" that is eventually flushed to disk and indexed, as an "sstable:"

Naive log-structured storage engines do tend to be slower at reads, for the same reason they are fast at writes: new values in rows do not overwrite the old ones in-place, but must be merged in the background by compaction. So in the worst case, you will have to check multiple sstables to retrieve all the columns for a "fragmented" row:

Cassandra makes several improvements to this basic design to achieve good read performance:

Myth: Cassandra is hard to run

There are three aspects to running a distributed system that tend to be more complicated than running a single-machine database:

  1. Initial deployment and configuration
  2. Routine maintenance such as upgrading, adding new nodes, or replacing failed ones
  3. Troubleshooting

Cassandra is a fully distributed system: every machine in a Cassandra cluster has the same role. There are no metadata servers that have to fit everything in memory. There are no configuration servers to replicate. There are no masters, and no failover. This makes every aspect of running Cassandra simpler than the alternatives. It also means that bringing up a single-node cluster to develop and test against is trivial, and behaves exactly the way a full cluster of dozens of nodes would.

Initial deployment is the least important concern here in one sense: other things being equal, even a relatively complex initial setup will be insignificant when amortized over the lifetime of the system, and automated installation tools can hide most of the gory details. But! If you can barely understand a system well enough to install it manually, there's going to be trouble when you need to troubleshoot a problem, which requires much more intimate knowledge of how all the pieces fit together.

Thus, my advice would be to make sure you understand what is going on during installation, as in this two-minute example of setting up a Cassandra cluster, before relying on tools like the Windows MSI installer, OpsCenter provisioning or the self-configuring AMI.

Cassandra makes routine maintenance easy. Upgrades can be done one node at a time. While a node is down, other nodes will save updates that it missed and forward them when it comes back up. Adding new nodes is parallelized across the cluster; there is no need to rebalance afterwards.

Even dealing with longer, unplanned outages is straightforward. Cassandra's active repair is like rsync for databases, only transferring missing data and keeping network traffic minimal. You might not even notice anything happened if you're not paying close attention.

Cassandra's industry leading support for multiple datacenters even makes it straightforward to survive an entire AWS region going down or losing a datacenter to a hurricane.

Finally, DataStax OpsCenter simplifies troubleshooting by making the most important metrics in your cluster available at a glance, allowing you to easily correlate historical activity with the events causing service degradation. The DataStax Community Edition Cassandra distribution includes a "lite" version of OpsCenter, free for production use. DataStax Enterprise also includes scheduled backup and restore, configurable alerts, and more.

Myth: Cassandra is hard to develop against

The original Cassandra Thrift API achieved its goal of giving us a cross-platform base for a minimum of effort, but the result was admittedly difficult to work with. CQL, Cassandra's SQL dialect, replaces that with an easier interface, a gentler learning curve, and an asynchronous protocol.

CQL has been available for early adopters beginning with version 0.8 two years ago; with the release of version 1.2 in January, CQL is production ready, with many drivers available and better performance than Thrift. DataStax is also officially supporting the most popular CQL drivers, which helps avoid the sometimes indifferent support seen with the community Thrift drivers.

Patrick McFadin's Next Top Data Model presentations (one, two) are a good introduction to CQL beyond the basics in the documentation.

Myth: Cassandra is still bleeding edge

From an open source perspective, Apache Cassandra is now almost five years old and has many releases under its belt, with version 2.0 coming up in July. From an enterprise point of view, DataStax provides DataStax Enterprise, which includes a certified version of Cassandra that has been specifically tested, benchmarked, and approved for production environments.

Businesses have seen the value that Cassandra brings to their organizations. Over 20 of the Fortune 100 rely on Cassandra to power their mission-critical applications in nearly every industry, including financial, health care, retail, entertainment, online advertising and marketing.

The most common reason to move to Cassandra is when the existing technology can't scale to the demands of modern big data applications. Netflix, the largest cloud application in the world, moved 95% of their data from Oracle to Cassandra. Barracuda Networks replaced MySQL with Cassandra when MySQL couldn't handle the volume of requests needed to combat modern spammers. Ooyala handles two billion data points every day, on a Cassandra deployment of more than two petabytes.

Cassandra is also augmenting or replacing legacy relational databases that have proven either too costly to manage and maintain. Constant Contact's initial project with Cassandra took three months and $250,000, compared to nine months and $2,500,000 on their traditional RDBMS. Today they have six clusters and more than 100TB of data trusted to Cassandra.

Many other examples can be found in DataStax's case studies and Planet Cassandra’s user interviews.

Not a myth: the 2013 Cassandra Summit in San Francisco

We just completed the best way to learn more about Cassandra, with over 1100 attendees and 65 sessions from Accenture, Barracuda Networks, Blue Mountain Capital, Comcast, Constant Contact, eBay, Fusion-io, Intuit, Netflix, Sony, Splunk, Spotify, Walmart, and more. Slides are already up; follow Planet Cassandra to be notified when videos are available.
 

About the Author

Jonathan Ellis is CTO and co-founder at DataStax. Prior to DataStax, he worked extensively with Apache Cassandra while employed at Rackspace. Prior to Rackspace, Jonathan built a multi-petabyte, scalable storage system based on Reed-Solomon encoding for backup provider Mozy.

Rate this Article

Adoption
Style

BT