BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Q&A with Drew Koszewnik on a Disseminated Cache, Netflix Hollow

Q&A with Drew Koszewnik on a Disseminated Cache, Netflix Hollow

This item in japanese

Bookmarks

Netflix announced Hollow, a general-purpose cache written in Java for caching meta-data.

Caching is a frequently used technique to overcome limitations such as network latency, bandwidth, etc. as a result of storing data in a centralized store. Developers are often faced with challenges when caching data including caching frequently used data locally and storing less frequently used data remotely in a data store. However, if all the data to be accessed can be cached, it makes implementations easier. Hollow takes the approach of making all data available to each consumer rather than distributing data over multiple servers and having to deal with replication algorithms for data durability.

Hollow’s precursor Zeno stored data as Plain Old Java Objects (POJOs), which leads to size limitations. Instead, Hollow uses a compact, fixed-length, strongly typed encoding of the data.

InfoQ caught up with Drew Koszewnik, senior software engineer at Netflix and lead contributor to this project regarding the details.

InfoQ: How is this different from other caches, specifically memcached? If developers are already using a cache, what might be some symptoms that may prompt them to look into this brand new cache?

Drew Koszewnik: Memcached does its job very well -- at Netflix we’ve built EVCache to leverage its enormous benefits. It is applicable to some problems that Hollow would struggle with due to the size of the cached data. On the other hand, Hollow solves some different problems more effectively than memcached -- especially when replicating an entire dataset across your fleet of servers is feasible. We’ve found that the set of situations in which that is feasible is broader than most engineers probably suspect. You might think of Hollow as a way to compress your dataset in memory while still providing O(1) access to any portion of it.

Memcached's main benefit is that it is a distributed caching system, it pools memory across many instances. But it’s still centralized, whereas you could say Hollow is not really a distributed caching system, it’s rather a disseminated caching system. We’re choosing the word disseminated -- as opposed to distributed -- to indicate that an entire copy of the dataset is replicated on each consumer.

Hollow was created to fulfill Netflix’s requirement for an extremely high throughput cache. Most design decisions in Hollow ultimately stem from this need. Replicating the entire dataset across each consumer means we don’t have to make a network hop to grab any data, so there is negligible latency when accessing any portion of the dataset. Hollow is also very focused on reducing the CPU cost to access any portion of data across the breadth of the entire dataset. By requiring a strongly typed, structured data model, as opposed to a loosely-typed key/value store like memcached, the access cost issue can be addressed once and reused across many different use cases. We’ve leaned into building a strongly typed system and have built a variety of tooling that takes advantage of understanding the structure of the data. The history and diff tools are great examples of this -- because we know the data model we can tell exactly what changed between two different states, and build some really useful and generically applicable UI elements around surfacing those changes.

One happy consequence of having the data represented entirely locally in a structured manner is that you don’t have to anticipate exactly how the data will be used when designing the data model. With a system like memcached, you have to know in advance how consumer systems are going to use the data -- they can only query by the predefined key. With Hollow, consumer teams are free to develop their access patterns without producer teams becoming a development bottleneck. This decoupling of data modeling from access pattern development is also why indexing happens at the consumer in Hollow, rather than constructing preconceived indices at the producer.

InfoQ: Is this for read-only data, single producer, and multiple consumers? Can you go into details of how the data is persisted?

Koszewnik: Yes, read-only data, single producer, and probably multiple consumers. The only persistence mechanism in Hollow is the blob store, which is just a file store -- it could be for example S3, NFS, or even an FTP server. Upon startup, consumers read an entire recent snapshot of the data store and bootstrap the dataset into memory. The dataset is kept up-to-date in memory via the application of deltas. On each consumer, the in-memory copy of the dataset is ephemeral -- if a consumer restarts then it needs to re-retrieve a snapshot from the blob store.

The actual format of the snapshot blob files is straightforward -- it’s largely the same structure as the in-memory layout, so initializing the data mostly consists of copying the blob contents directly into memory, which can be done quickly. This keeps initialization times low.

InfoQ: Can you address the concerns outlined in the CAP theorem? For instance, you could be working on stale data while there are some intermittent network connection issues?

Koszewnik: Because Hollow isn’t really a distributed data store in the classical sense, the CAP theorem has to be considered a bit differently in this case. I suppose you can say that Hollow excels in the areas of Availability and Partition Tolerance, while compromising on Consistency.

Since consumers each have an entire copy of the dataset in RAM, availability is second to none -- it isn’t at all susceptible to environmental issues once a consumer is initialized. As for Partition Tolerance -- there aren’t any partitions, the data is either totally present on a consumer or it’s not.

When it comes to Consistency things get more interesting. In Hollow, the timeline for a changing dataset must be broken down into discrete data states, each of which is a complete snapshot of the data at a particular point in time. These states get produced on a regular cadence, probably minutes apart. So there must be some tolerance for propagation delay with your dataset. There are things you can do to mitigate this issue -- for example sending urgent updates through a separate channel as overrides.

InfoQ: This cache is not meant for any data. It’s primarily for smaller meta-data, correct? What happens in cases where you exhaust the memory resources on the local server?

Koszewnik: Correct, Hollow is primarily for small to medium sized data sets. We give the rule of thumb of MBs and GBs, not TBs or PBs. Before ruling out a dataset as too large for Hollow, consider that a JSON-based document store approaching 1 terabyte will take much less space when stored in Hollow. Hollow’s hyper-efficient storage combined with a well-tuned data model may turn your large dataset into a medium-sized one.

As Google’s Jeff Dean has taught, you should design for growth -- ensure that your system works with 10x or 20x growth, but the right solution for 100x isn’t necessarily optimal for x. This rule applies here. For growing datasets, Hollow provides tools for fine-grained analysis of heap usage to identify optimization targets. Comparing these metrics over time is useful for the early detection of potential problems. Particularly you want to look out for superlinear growth in a type; this can be an indicator of a poorly optimized data model or may represent a new, unaccounted for axis of growth.

Data modeling is really important here. The way the data is structured will have an enormous impact on how well Hollow can deduplicate the dataset, and therefore how much compression is achieved. Whereas a generic compression algorithm finds patterns in the data for you with a Huffman Tree, with Hollow you specify the patterns using the record structure in your data model. It requires a bit of forethought up front but is the key to getting great compression with Hollow while maintaining high-throughput random access.

InfoQ: Talk a little bit about the compression algorithm and what prompted you to move away from POJOs?

Koszewnik: Hollow achieves its compression in a number of different ways, including deduplication, encoding, packing, and elimination of overhead from Java Objects. The method and scope of deduplication is largely carried over from Zeno, but without stepping away from POJOs we wouldn’t have been able to achieve encoding, packing, and of course elimination of Object overhead.

Again focusing on performance, we needed a way to both minimize the CPU impact when accessing data in a Hollow dataset, and minimize the heap footprint required for representing that data. We ended up with a fixed-length bit-aligned representation for data. Each field will only use exactly the number of bits required to represent its maximum value across all records. We then use techniques like unaligned memory access to minimize the number of instructions required to retrieve data. One consequence of some of these optimizations is that we’ve built Hollow with x86-64 architecture in mind -- it currently requires little-endian byte ordering and it assumes that there’s no (or negligible) performance penalty for an unaligned read. Making some of these optimizations optional may be considered in the future for greater portability.

InfoQ: Two questions about Java. Why did you decide to go with Java for the implementation (and not Go for instance)? Can non-Java apps take advantage of Hollow?

Koszewnik: We built Hollow for Java because each of our direct consumer systems for video metadata are built using JVM-based languages. Non-JVM based languages cannot currently directly take advantage of Hollow.

InfoQ: Final question. Is this specific to AWS and S3? Can you share some benchmarks and provide a roadmap?

Koszewnik: No, Hollow doesn’t provide or specify the infrastructure for actually disseminating the produced blobs from the producer to consumers. If potential users start with the quick start guide, it walks through plugging in an AWS S3/DynamoDB based infrastructure to an example project to create a production-scalable implementation. But it should be relatively easy to use those samples as a guide to create new implementations of the Publisher, Announcer, HollowBlobRetriever, and HollowAnnouncementWatcher for any infrastructure.

I’ve avoided providing benchmark results because they vary so widely depending on the data model. Results would be misleading in both directions depending on the use case. It’s probably sufficient to say that compared to POJOs, the tradeoff is that there’s a very slight penalty for accessing data from Hollow records with a big upside on the heap footprint reduction.

There is no strict roadmap -- there are a few things I’d really like to see tackled but from a broader perspective we’re hoping to get feedback from the community, especially related to usability, to see where our efforts can have the biggest improvement for the project.

The Hollow github site provides more information on how to get started.

 

Rate this Article

Adoption
Style

BT