BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Netflix Drive: Building a Cloud-Native Filesystem for Media Assets

Netflix Drive: Building a Cloud-Native Filesystem for Media Assets

Bookmarks

Key Takeaways

  • Media applications need the flexibility and the ability to work on a subset of assets from the large corpus of petabytes to exabytes of data, and Netflix Drive provides that.
  • Netflix Drive is a multi-OS (Linux, macOS, Windows), multi-interface (POSIX + REST) cloud file system for media applications.
  • Netflix Drive contains abstractions for backend data and metadata stores, so developers can implement the appropriate interfaces and use Netflix Drive on cloud, on premise, or in hybrid environments.
  • Netflix Drive relies on security primitives such as 2FA to allow artists and applications to interact only with a pertinent subset of the large corpus of data.
  • Netflix Drive design allows developers to build shared workspaces, dynamic workspaces, and user workspaces on top of the base layer. Different types of applications, such as cloud drives, rendering, etc., leverage different workspaces.

Netflix Drive is a multi-interface, multi-OS cloud file system that intends to provide the look and feel of a typical POSIX file system on studio artists' workstations. 

It also behaves like a microservice in that it has REST endpoints. It has backend actions that many workflows use and automated use cases in which users and applications do not directly deal with files and folders. The REST endpoints and the POSIX interface can co-exist in any Netflix Drive instance. They are not mutually exclusive.

Netflix Drive has event-alerting backends configured as part of the framework. Events and alerts are first-class citizens in Netflix Drive.

We built Netflix Drive as a generic framework so that users can plug in different types of data and metadata stores. For example, you could have Netflix Drive with DynamoDB as the metadata-store backend, and S3 as a data-store backend. You could also have MongoDB and Ceph Storage as the backend data stores and metadata stores. For a more detailed presentation of this framework, you can watch the full video presentation

Why we built Netflix Drive

Netflix is, in general, pioneering the concept of an entertainment studio in a cloud. The idea is to allow artists to work and collaborate across the world. To do so, Netflix needs to provide a distributed, scalable, and performant platform infrastructure. 

At Netflix, assets are collections of files and folders with data and metadata that are stored and managed by disparate systems and services. 

From the starting point of ingestion, when cameras record video (produce the data), until the data makes its way to movies and shows, these assets get tagged with a variety of metadata by different systems, based on the workflow of the creative process. 

At the edge, where artists work with assets, the artists and their applications expect an interface that allows seamless access to these files and folders. This easy workflow is not restricted only to artists, but extends to the studio. A great example is the asset transformations that happen during the rendering of content, which uses Netflix Drive.

Studio workflows need to move assets across various stages of creative iterations. At each stage, an assert gets tagged with new metadata. We needed a system that could support the addition of different forms of metadata to the data. 

We also need levels of dynamic access control that can change in each stage, so that the platform projects only a certain subset of assets to certain applications, users, or workflows. We investigated AWS Storage Gateway, but its performance and security aspects did not meet our requirements.

We came up with the design of Netflix Drive to satisfy all of these considerations in multiple scenarios. The platform can serve as a simple POSIX file system that stores data on and retrieves data from the cloud, but it has a much richer control interface. It is a foundational piece of storage infrastructure that supports the needs of many Netflix studios and platforms.

Architecture of Netflix Drive

Netflix Drive has many interfaces as shown in figure 1. 

Figure 1: The basic architecture of Netflix Drive.

The POSIX interface (figure 2) allows simple file-system operations on files, such as creation, deletion, opening, renaming, moving, etc. This interface deals with the data and metadata operations on Netflix Drive. Files stored in Netflix Drive get read, write, create, and other requests from different applications, users, and scripts or workflows that do these operations. This is similar to any live file system.

Figure 2: The POSIX interface of Netflix Drive.

The other interface is the API interface (figure 3). It provides a controlled I/O interface. The API interface is of particular interest to a lot of workflow management tools or agents. This exposes some form of control operations on Netflix Drive. A lot of workflows used in the studio have some awareness of assets or files. They want to control the projection of these assets on the namespace. A simple example is when Netflix Drive starts up on a user’s machine; the workflow tools will only allow the user to initially view a subset of the large corpus of data. That is managed by these APIs. The APIs are also available for dynamic operations, such as uploading a particular file to the cloud or dynamically downloading a specific set of assets and attaching and revealing them at specific points in the namespace.

Figure 3: The API interface of Netflix Drive.

As mentioned, events (figure 4) are of primary importance in the Netflix Drive architecture, and events contain telemetry information. A great example of this is the use of audit logs that track all actions that different users have performed on a file. We might want services running in the cloud to consume audit logs, metrics, and updates. Our use of a generic framework allows different types of event backends to plug into the Netflix Drive ecosystem.

The events interface is also used to build on top of Netflix Drive. We can create notions of shared files and folders using this interface.

Figure 4: Events in Netflix Drive.

The data-transfer layer (figure 5) is an abstraction that deals with the transfer of data out of Netflix Drive to multiple tiers of storage and different types of interface. It brings the files into a Netflix Drive mount point on an artist’s workstation or machine.

Figure 5: Data transfer in Netflix Drive.

For performance reasons, Netflix Drive does not deal with sending the data directly to the cloud. We want Netflix Drive to perform like a local file system as much as possible. So we use local storage, if available, to store the files and then have strategies for moving the data from the local storage to cloud storage. 

We typically use two ways to move data to the cloud. First, the control interface uses dynamically issuing APIs to allow workflows to move a subset of the assets to the cloud. The other is auto-sync, which is an ability to automatically sync all local files with the files in cloud storage. It’s the same way that Google Drive stores your files. For this, we have different tiers of cloud storage. Figure 5 particularly notes Media Cache and Baggins: Media Cache is a region-aware caching tier that brings data closer to the edge user and Baggins is our layer on top of S3 that deals with chunking and encrypting content.

Overall, the Netflix Drive architecture has the POSIX interface for data and metadata operations. The API interface deals with different types of control operations. The event interface tracks all the state-change updates. The data-transfer interface abstracts moving the bits in and out of Netflix Drive to the cloud.

Anatomy of Netflix Drive

The Netflix Drive schema (figure 6) consists of three layers: the interface, the storage backend, and the transport service.

Figure 6: The anatomy of Netflix Drive.

The top interface layer has all the FUSE file handlers alongside the REST endpoints. 

The middle layer is the storage backend layer. Remember that Netflix Drive provides a framework into which you can plug and play different types of storage backends. Here we have the abstract metadata interface and the abstract data interface. In our first iteration, we have used CDrive as our metadata store. CDrive is Netflix’s own studio-asset-aware metadata store. As mentioned, Baggins is Netflix's S3 datastore layer that chunks and encrypts content before pushing it to S3. 

Intrepid is the transport layer that transfers the bits from and to Netflix Drive. Intrepid is an internally developed high-leverage transport protocol used by many Netflix applications and services to transfer data from one service to another. Intrepid not only transports the data but transfers some aspects of the metadata store as well. We need this ability to save some states of the metadata store on the cloud.

Figure 7: Abstraction layers of Netflix Drive.

Figure 7 shows the abstraction layers in Netflix Drive. 

Because we are using a FUSE-based file system, libfuse handles the different file-system operations. We start Netflix Drive and bootstrap it with a manifest, along with the REST APIs and control interface. 

The abstraction layer abstracts the default metadata stores and the datastores. We can have different types of data and metadata stores — in this particular example, we have the CockroachDB adapter as the metadata store and an S3 adapter as the data store. We can also use different types of transfer protocols, and they are part of a plug-and-play interface in Netflix Drive. The protocol layer can be REST or gRPC. Finally, we have the actual storage of data.

Figure 8: Abstraction layers of Netflix Drive.

Figure 8 shows how the services are split between local workstation and the cloud. 

The workstation machine has the typical Netflix Drive API and POSIX interface. Netflix Drive on a local workstation will use the transport agent and library to talk to the metadata store and the data store.

The cloud services contain the metadata store, which is CDrive at Netflix. A media cache serves as an intermediate tier of storage. S3 provides object storage. 

Note that we also use local storage to cache the read and the write, to absorb a lot of the performance that the users expect from Netflix Drive. 

Security is a concern in Netflix Drive. Many applications use these cloud services; they front all of the corpus of assets in Netflix. It is essential to secure these assets and to allow only users with proper permissions to view the subset of assets that they are allowed to access. So, we use two-factor authentication on Netflix Drive.

Security is built as a layer on top of our CockroachDB. Netflix Drive leverages several security services that are built within Netflix at this point. We don't have external security APIs that we can plug in. We plan to abstract them out before we release any open-source version so that anyone can build pluggable modules to handle that.

Typical lifecycle of Netflix Drive

Given the ability of Netflix Drive to dynamically present namespaces and bring together disparate data stores and metadata stores, it is essential to consider its lifecycle. 

We initially bootstrap Netflix Drive using a manifest, and that initial manifest could be empty. We have the ability to allow workstations or workflows to download assets from the cloud and preload the Netflix Drive mount point with this content. Workflows and artists modify these assets, which Netflix Drive will periodically snapshot with explicit APIs or use the auto-sync feature to upload these assets back to the cloud.

During the bootstrap process, Netflix Drive typically expects a mount point to be specified. It uses the user’s identity for authentication and authorization. It establishes the location of the local storage, where the files will be cached, and the endpoint cloud metadata store and data store. The manifest contains optional fields for preloading content. 

Different types of applications and workflows use Netflix Drive, and the persona of each supplies it with its particular flavor. For example, one application may rely specifically on the REST control interface because it is aware of the assets and so will explicitly use APIs to upload files to the cloud. Another application may not necessarily be aware of when to upload the files to the cloud, so would rely on the auto-sync feature to upload files in the background. These are the sorts of alternatives that each persona of Netflix Drive defines.

Figure 9: A sample bootstrap manifest for Netflix Drive.

Figure 9 shows a sample bootstrap manifest. After defining the local storage, Netflix Drive manifests the instances. Each mount point can have several distinct instances of Netflix Drive, and here we see two in use: a dynamic instance and a user instance, each with different backend data stores and metadata stores. The dynamic instance uses a Redis metadata store and S3 for the data store. The user instance uses CockroachDB as a metadata store and Ceph for the data store. Netflix Drove assigns a unique identity to each workspace for data persistence.

Figure 10: Statically setting up a Netflix Drive namespace.

The namespace of Netflix Drive is all the files that are viewed inside it. Netflix Drive can create the namespace statically or dynamically. The static method (figure 10) specifies at bootstrap time the exact files to pre-download to the current instance with. For this, we present a file session and container information. Workflows can pre-populate a Netflix Drive mount point with files, so that the subsequent workflows can then be built on top of it.

The dynamic way to create a namespace is to call Netflix Drive APIs in the REST interface (figure 11). In this case, we use the stage API to stage the files and pull them from cloud storage, then attach them to specific locations in the namespace. These static and dynamic interfaces are not mutually exclusive.

Figure 11: Dynamically setting up a Netflix Drive namespace.

Updating content

POSIX operations on Netflix Drive can open/close, move, read/write, and do more to files. 

A subset of REST APIs can also modify a file — for example, an API can stage a file, pull the file from the cloud. One can checkpoint a file. One can save a file, which explicitly uploads the file to cloud storage. 

Figure 12 is an example of how a file is uploaded to the cloud, with the Publish API. We can autosave files, which would periodically checkpoint the files to the cloud, and we have the ability to perform an explicit save. The explicit save would be an API that different workflows invoke to publish content.

Figure 12: The Publish API of Netflix Drive.

A great example of the use of different APIs is when artists are working on a lot of ephemeral data. Much of this data does not have to make it to the cloud because it's work in progress and not a final product. For such workflows, an explicit save is the right call instead of autosaving, the Google Drive way of saving files. Once an artist is sure the content has reached a point when it can be shared to other artists or workflows, they would invoke this API to save it to the cloud. The API will snapshot the selected files in the artist’s Netflix Drive mount point and deliver them to the cloud, storing them under the appropriate namespace.

Lessons

Multiple personas use Netflix Drive in different types of workflows, which taught us a lot while developing it. We encountered several points at which we had to consider choices for our architecture.

The performance/latency aspects of files, workflows, and artist workstations — and the experience that we wanted to provide to artists who would use Netflix Drive — dictated many of our architectural choices. We implemented a lot of the code with C++. We compared languages and concluded that C++ gave us the best performance, a critical aspect that we wanted to provide. We did not consider using Rust because Rust at that time did not sufficiently support the FUSE file system.

We always intended Netflix Drive to be a generic framework that could accept any data store and metadata store that someone wanted to plug into it. Designing a generic framework for several operating systems is difficult. After investigating alternatives, we decided to support Netflix Drive on CentOS, macOS, and Windows with a FUSE-based file system. That multiplied our testing matrix and our supportability matrix.

We work with disparate backends and have different layers of caching and tiering, and we rely on cached metadata operations. We built Netflix Drive to serve exabytes of data and billions of assets. Designing for scalability was one of the cornerstones of the architecture. We often think that the bottleneck of scaling a solution on the cloud would be the data store, but we learned that the metadata store is the bottleneck for us. The key to our scalability is handling metadata. We focused a lot on metadata management, on reducing the number of calls to metadata stores. Caching a lot of that data locally improved performance for the studio applications and workflows that are often metadata heavy. 

We explored having file systems in the cloud, like EFS, but with file systems, you cannot scale beyond a point as it eventually impacts performance. To serve billions of assets, we need to use some form of an object store and not a file store. That means that the files that our artists are familiar with have to be translated into objects. The simplest thing to do is one-to-one mapping between every file and an object — although that is simplistic because file sizes might exceed the maximum supported object size. We want the ability to map one file to multiple objects. If an artist changes one pixel in the file, Netflix Drive then can only change the object that has that relevant chunk of the file. Building that translation layer was a tradeoff and was something that we did for scalability. 

Use of objects brings up the issues of deduplication and chunking. Object stores use versioning: every change of an object, no matter how small the change, creates a new version of the object. Traditionally, the change in one pixel of a file means sending the entire file and rewriting it as an object. You cannot just send the delta and apply that delta on cloud stores. By chunking one file into many objects, we reduce the size of the object that you have to send to the cloud. Choosing the appropriate chunk size is more of an art than a science, because many smaller chunks means managing a lot of data and a lot of translation logic, and the amount of the metadata will increase. Another consideration is encryption. We encrypt each chunk, so more, smaller chunks lead to many more encryption keys, and metadata for that. Chunk size in Netflix Drive is configurable.

Having multiple tiers of storage can improve performance. When we designed Netflix Drive, we did not restrict ourselves to only local storage or cloud storage. We wanted to build it so that different tiers of storage can easily be added to the Netflix Drive framework. That came through in our design, in our architecture, and in our code. For example, our media cache is nothing but a media store, a caching layer that is closer to the user and the applications. Netflix Drive caches a lot of data on the local file store, which Google Drive doesn't do, and so we always get better local file-system performance compared to it.

This was another reason for why we rejected AWS Storage Gateway. If multiple artists are working on an asset and every iteration of this asset is stored in the cloud, our cloud costs will explode. We wanted these assets to be stored close to the user in media caches, which is something that we own, and control when the final copy goes to the cloud. We can take advantage of such a hybrid infrastructure, and these parameters were available to us through AWS Storage Gateway.

Having a stacked approach to software architecture was critical. A great example is the idea of shared namespaces. We are currently working on the ability to share files between different workstations or between different artists. We are building this on top of our eventing framework, which we designed as part of the Netflix Drive architecture itself. When a user on one Netflix Drive instance adds a file to a specific namespace, it generates an event that different cloud services may consume. Netflix Drive then uses the REST interface to inject that file into other Netflix Drive instances that access that namespace. 

If you would like to learn more about Netflix Drive, we have a tech blog available on the Netflix Technology Blog channel. 

We are working towards open-sourcing Netflix Drive over the next year. Many folks who are trying to build studios in the cloud have reached out to us. They want to use Netflix Drive, the open-source version of it, and build pluggable modules for their use cases. We intend to prioritize this.

About the Author

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT