BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Netflix Introduces Hollow, a Java Library for Processing In-Memory Datasets

| by Michael Redlich Follow 7 Followers on Jan 31, 2017. Estimated reading time: 3 minutes |

Netflix recently introduced Hollow, a Java library and toolkit designed to efficiently cache datasets not characterized as “big data.” Such datasets may be metadata for e-commerce and search engines, or in the case of Netflix, metadata about movies and TV shows. Traditional solutions for processing such datasets include the use of a datastore or serialization, but typically suffer from reliability and latency issues. Hollow’s getting started guide summarizes the core concepts and nomenclature:

Hollow manages datasets which are built by a single producer, and disseminated to one or many consumers for read-only access. A dataset changes over time. The timeline for a changing dataset can be broken down into discrete data states, each of which is a complete snapshot of the data at a particular point in time.

 

The producer and the consumers handle datasets via a state engine that is transitioned between data states. A producer uses a write state engine and a consumer uses a read state engine.

Hollow replaces Netflix’s previous in-memory dataset framework, Zeno. Datasets are now represented with a compact, fixed-length, strongly typed encoding of the data. This encoding minimizes a dataset’s footprint and the encoded records are “packed into reusable slabs of memory that are pooled on the JVM heap to avoid impacting GC behavior on busy servers.”

Getting Started

To get started with a Hollow example, consider the following POJO:

    
public class Movie {
    long id;
    String title;
    int releaseYear;

    public Movie(long id,String title,int releaseYear) {
        this.id = id;
        this.title = title;
        this.releaseYear = releaseYear;
        }
    }
    

A simple dataset based on the POJO above may be populated such as:

    
List<Movie> movies = Arrays.asList(
    new Movie(1,"The Matrix",1999),
    new Movie(2,"Beasts of No Nation",2015),
    new Movie(3,"Goodfellas",1990),
    new Movie(4,"Inception",2010)
    );
    

Hollow translates this movies list to the new encoding layout as shown below:

More details on this encoding can be found in the advanced topics section of the Hollow website.

The Producer

The first instance of a producer publishes an initial data state of a dataset (movies in this example) and consumers are notified on where to find that dataset. Subsequent changes to the dataset are systematically published and communicated to consumers.

A producer uses a HollowWriteStateEngine as a handle to a dataset:

    
HollowWriteStateEngine writeEngine = new HollowWriteStateEngine();
    

A HollowObjectMapper populates a HollowWriteStateEngine:

    
HollowObjectMapper objectMapper = new HollowObjectMapper(writeEngine);
for(Movie movie : movies) {
    objectMapper.addObject(movie);
    }
    

The HollowObjectMapper is thread safe and can also be executed in parallel.

The producer writes the dataset (also known as a blob) to a defined output stream:

    
OutputStream os = new BufferedOutputStream(new FileOutputStream(snapshotFile));
HollowBlobWriter writer = new HollowBlobWriter(writeEngine);
writer.writeSnapshot(os);
    

Generating an API for Consumers

A client API generates necessary Java files based on the data model and must be executed before writing the initial consumer source code:

    
HollowAPIGenerator codeGenerator = new HollowAPIGenerator(
    "MovieAPI", // a name for the API
    "org.redlich.hollow.consumer.api.generated", // the path for generated API files
    stateEngine); // the state engine
codeGenerator.generateFiles(apiCodeFolder);
    

The Consumer

Once the consumer is notified of a published dataset, the consumer uses a HollowWriteReadEngine as a handle to a dataset:

    
HollowReadStateEngine readEngine = new HollowReadStateEngine();
    

A HollowBlobReader consumes a blob from the producer into a HollowReadStateEngine:

    
HollowBlobReader reader = new HollowBlobReader(readEngine);
InputStream is = new BufferedInputStream(new FileInputStream(snapshotFile));
reader.readSnapshot(is);
    

The data within the dataset can be accessed via the generated API:

    
MovieAPI movieAPI = consumer.getAPI();
for(MovieHollow movie : movieAPI.getAllMovieHollow()) {
    System.out.println(movie._getId() + ", " +
    movie._getTitle()._getValue() + ", " +
    movie._getReleaseYear());
    }
    

This will print the resulting output:

    
1, "The Matrix", 1999
2, "Beasts of No Nation", 2015
3, "Goodfellas", 1990
4,"Inception", 2010
    

The entire Hollow project can be found on GitHub.

InfoQ recently featured a detailed interview with Drew Koszewnik, senior software engineer at Netflix and lead contributor to Hollow, regarding Hollow’s specific implementation details.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT