BT

Facilitating the spread of knowledge and innovation in professional software development

Contribute

Topics

Choose your language

InfoQ Homepage Podcasts Michelle Brenner Builds Netflix Workstations and Enables Artists to Create From Anywhere

Michelle Brenner Builds Netflix Workstations and Enables Artists to Create From Anywhere

Bookmarks

Producing television shows and movies at Netflix-scale (i.e. one new movie per week instead of one or two per year) means having a way to efficiently work with many artists and content creators. Netflix Workstations were created as a cloud-based solution to provide artists with secure access to the applications and content they need to complete their work. On this episode of the podcast, Thomas Betts talks with Michelle Brenner about the benefits and trade-offs of the solution that enables artists to create from anywhere.

Key Takeaways

  • Security is a primary concern for any production house, but existing options of bringing everyone on site or managing the secure transfer of equipment would not scale to meet the needs of Netflix Studios.
  • Because the primary users of Netflix Workstations are artists, support for peripherals such as Wacom tablets and multiple, high-resolution monitors was necessary. The team worked with Nice DCV and Teradici to ensure a great user experience.
  • The configuration options for the workstations have evolved several times. What started as a “white glove” approach to onboard someone has become a self-serve platform, allowing a user to choose what they need to get their job done.
  • The fully cloud-native solution meant the team could benefit from established Netflix best practices around file storage, backups, and security.
  • The need to add remote workers was a priority when the project began in 2019. The shift to fully remote teams served to accelerate the timeline, but was not a primary driver for starting the project.

 

Transcript

Intro [00:30]

Thomas Betts: Hello, and thank you for joining us for another episode of The InfoQ Podcast. I'm Thomas Betts, co-host of the podcast, Lead Editor for Architecture and Design at InfoQ, and a Senior Principal Software Architect at Blackbaud. Today I'm speaking with Michelle Brenner, a Senior Software Engineer at Netflix, about Netflix workstations. It's an interesting story about allowing artists to create from anywhere, whether that's in an office in Mumbai or at their house in Vancouver. Michelle, welcome to The InfoQ Podcast.

Michelle Brenner: Thank you for having me. It's great to be here.

Business Reasons for Netflix Workstations [00:56]

Thomas Betts: So, at Netflix you specifically work at Netflix Studios, is that right?

Michelle Brenner: work at Netflix, and I work for the content creators. So, the artists that are working on visual effects, animation, all that stuff, working on tools to help them do their work.

Thomas Betts: And, as I understand, Netflix Studios is basically the production house for all the Netflix-exclusive shows. And like you said, you work with the content creators, and you helped create Netflix Workstations. I said, it's a tool for them to help do their job. But, I have a naïve view of what artists and content creators do. It seems like they have a computer that has a better graphics card than I do, but I know it's not that simple. What was the motivation behind the project? What are the business and technology problems that existed that you had to come up with this as a solution?

Michelle Brenner: Netflix basically wants to make more content than anyone has ever made before, which is our biggest North Star. How do we enable that scale of content? I've worked in entertainment a long time, basically most of my career. And I'm usually working on one or two movies at a time, over a long period of time. And if you've seen it recently, Netflix is releasing a movie a week. So, it's a lot of content that needs to be made. Unfortunately, there's not enough people all in one place to make that. Cause it takes a ton of really talented artists to make all that great content. So, the idea behind making Netflix Workstations, which are these remote, cloud-based workstations, is that you can work from anywhere. And then, we can draw that talent in to work more conveniently than trying to convince everyone to move to one central location and one central office.

Thomas Betts: I think maybe it's not only having people move into the office, but I started my job as a remote employee, and they just shipped me a laptop. Why can't you just send out a computer to the artists and have them use those?

Michelle Brenner: Yeah, that's a question I get all the time. There's kind of two main reasons. One of them is flexibility. So, by having cloud-based workstations, you can give the user any experience and update it on the fly, right? You can give them a different GPU, a different operating system, all of that. And then with security, it's really nice to have the files and the computer all in one place. Traditionally in entertainment, files are passed along every which way, hard drive, email, FTP. And, kind of, every point that a file moves, there's a point for that file to be intercepted, someone to do something with it.

We don't want that. We want people to see the movie or the TV show once it's fully finished. We don't want people to see partially finished work. It gives a bad taste in your mouth. So, by having the files and the computer all in the cloud, all in the same place, it just removes one of the pathways for interception. But, I love the idea of giving people anything they need at the time they need it, right? Do they need a Windows workstation with a big hard drive? Do they need a CentOS workstation with a big GPU? Well, let me just spin one up for you in the cloud, and now you have it. If that's not exactly what you need, we can change it around. So, having all that flexibility is great.

Special Needs for Artists [03:43]

Thomas Betts: Yeah, it's interesting. We hear about the cloud being used a lot for, "I have this workload that's going to scale as I get more users coming to my website. And so, I just build up more instances, or have a bigger server, or whatever I do to scale." But you're talking about that on a very individual basis, that this person needs to have a bigger computer, or a different computer today than they did yesterday. And you're able to meet that.

Michelle Brenner: Yeah. Productions are very flexible and constantly changing. There's constantly improvements in technology, and software, and things like that. So, it's important to, kind of, stay ahead of that curve. In my experience, working in entertainment, you're getting new cameras, new types of footage, new data that they didn't have on the last production. So, you always need to be able to support that. One of the questions I get often is, "Hey, why don't you use already built tools, right? Different sorts of remote workstations that aren't, kind of, this homegrown system." And I just want to talk more about how it is to be an artist, right?

So, they just have really specialized needs. Thus, it's easier for us to prepare when we have these custom solutions. Having these petabytes of data is not something you can normally have. As an engineer, it's easy for me to use any remote workstation service, right? I just need an IDE, a little bit of storage for my text files, access to stack overflow, all that stuff. But as an artist, you need that super powerful computer and access to all that data. And you just get a lot more challenges at that scale. And it's just easier to meet those challenges in a custom solution, rather than try to push a solution made for the 90% of workflows into this artist's workflow.

Thomas Betts: Yes. I just think I've seen remote workstation solutions, and there was Citrix, and Wyse terminals, and different things back in the day. And now, I know AWS has their own just spin up an AWS workstation. They seem to be more to the common denominator of back office work. I'm going to use Word, and Office, and the internet. And it's basically, "I just didn't give you a laptop, I gave you a portal into a computer that's sitting on a server somewhere." And that certainly wouldn't meet the needs of your artists.

Michelle Brenner: Exactly. And that, any time delay between working can be horrible. You have to support also these peripherals, right? You have to support Wacom tablets. You have to support very large monitors. You have to support all these things that when you're building for the 90% of use cases, you're not going to build for that, cause that's not what most people need. So, it's important for us to get the things right, because if the artists don't get a workstation they can use and do their daily work, they're not going to want to use it at all. It's just going to be an impediment to them getting their work done. That's not something we want.

A very long time ago at an old job, I would use Chrome to remote into a computer that was at a different office when I didn't feel like going into my office and I'd forgotten to bring home my laptop. There's all sorts of solutions, but it was definitely that small delay. It's kind of unacceptable when you're working on really complex artistry and you really need to get your work done today.

Remote Display Protocols [06:30]

Thomas Betts: So, with these solutions, it's still a hosted machine in the cloud, but you're saying there isn't perceptible lag? They're able to still use computers as if they were sitting on their workstation in front of it at home?

Michelle Brenner: Yeah. That was a really important kind of table stakes for getting this to work. So, we use two different remote display protocols right now. One of them is Teradici, and one of them is NICE DCV. NICE DCV is an AWS service that just comes on an EC2 Instance. And when people are curious about using remote workstations, I usually recommend that because it just comes with it. Right? You bring up an EC2 Instance. You can have NICE DCV on it, and you can try it out, and you can play around with it, and see if maybe remote workstation is something you want to try out.

Cause people have asked me, "Oh, maybe I want to use it. No, I need to do big machine learning jobs. And I don't really want to do a batch service right now. I want to just try it out on a computer in the cloud."

"Well, just spin one up, and remote in, and see if that's something useful for you to do." And we kind of made it incumbent on those services. "Hey, we need the very little bit of latency to make this work." So, it was really important, before we started using them, that it worked for that.

Thomas Betts: I think I've seen some of those, especially... I'm a developer, so people are listening to this, probably familiar with the IDE in the browser. Not even remoting onto your laptop that you left in the office, but that you can get a version of Visual Studio Code that basically runs inside a browser. So, people are seeing more of those scenarios for, "I want to work remotely without having my entire laptop in front of me, my entire desktop." But again, those seem very limited, and not able to do the power that you need for your scenarios.

Michelle Brenner: Yeah. It's kind of the power and the flexibility. And we have tested out app streaming. We really liked that as well. So, with NICE DCV, you have that option of either doing an entire desktop, or just doing a single app in a browser window. So, you could have two tabs in a browser. And one of them is one application, one of them is another application, and they're actually on two separate computers. So, you're using that entire computer for one application and it's not interfering with each other.

But, you can switch around to two different types of work, and kind of have shared storage and that sort of thing, without having to do a lot of work to get that to happen, which I think is a really cool workflow that people try out. It was really important for us to have, kind of flexible workflows so that that artists can bring up anything they want. Do they want a whole desktop? Do they want just app streaming? Do they want this operating system, or this operating system? What software do they want? Having that flexibility was a big part of how we built it, to make it easy for them to do their work and not have to worry about "Where are my files? Where is my software? Where are the licenses for my software?" All that stuff is kind of abstracted away, and they don't have to deal with that.

Supporting Artist Applications and Workflows [09:03]

Thomas Betts: And, maybe for my benefit and the listeners, who falls under the category of artists in this scenario? Is it someone who's just drawing stuff up in Photoshop? Are they doing full 3D, CGI, computer animation, full animated movies?

Michelle Brenner: It's both, right? So, it's anywhere in the production pipeline, right? So, you have people doing Photoshop, just drawing up maybe a story board, or a previsualization, or a matte painting. And that, all the way to the end of production, where people are compositing in the final frames, and the final lightning, and everything in between. Production's really cool, cause there's a lot of different steps to it. And files kind of move along a track, and they're slowly changed until you have the final frame.

So, it all starts with original sketches, or original plates, which is the original film that you shoot in camera. And then, people keep adding things to it until you get the final shot. So, kind of tracking those files as it goes along, and task tracking. That's a whole 'nother challenge that we kind of build workstations into. And it's like workstations is the glue, and all those other pipelines go around it, where it's every artist could pick up a workstation, do their part of a shot. And then, the next artist can pick it up and do that part of the shot. So, it's a wide variety of use cases, which leads into, "Okay, we need to be flexible, for someone who's working on a single Photoshop file, to someone who's working on a terabyte of 8K plates, and need to draw on them."

Thomas Betts: Gotcha. And so, that idea of you could have one application in a browser might satisfy the Photoshop scenario, but if they're doing lots of different tasks in a day, they want the full workstation. And so, I can see where that flexibility really comes into play, that you can offer them exactly what they need that day.

Michelle Brenner: Yeas. And part of it is, this is an entirely new thing, right? So, I started this in December 2019 with a new team, a new repo, totally blank. So, part of our learnings is getting people to use it, and then seeing what they want, and iterating. So, we've iterated on things like configuration management two or three times as we see how people use it. We had this idea of how people are going to use it. We're like, "Oh, we asked them. They said they're going to use it like this. And this team said they're going to use it like this, so that's how we're going to do it."

And then, people start using it, and they're like, "Oh, it's easier if we do it like this, now that we have it in front of us." I mean, that's basically the challenge of any product, in any technology. So, it's been a really great learning experience for us, and trying out different things, and making things available for different people.

Handling Configuration Options [11:24]

Thomas Betts: I'm going to go back to something you touched on earlier, which was that there's, I think, two major elements that anyone who's using a computer to do their job needs to be successful. You've got the applications, and then the data, or the files. And obviously, for me as a developer, that's my IDE and my git repos. But then, there's also a third layer, almost, of personalization. My settings, my customization; the things that I like to do to set up my machine so that it works the way I get familiar with it.

Let's start with the applications first. For the remote workstations, you said it's all about flexibility, and you give these different options. How do you enable that so that it's not a one size fits all approach? Do they get to go in and say, "Today, I need this." And it's like choosing toppings on your pizza, and that's the computer you get? Or do you have a few set configurations that you've come up with, that these are common scenarios?

Michelle Brenner: We use SaltStack for a lot of our software and environment configurations, things like that. And we've actually built up over time, probably 100, 150 different packages that people can use. And between my team, which is kind of the platform team, and the artist, we actually have a couple people in between that help us figure out what that team needs. So, there are Pipeline Engineers and Technical Directors that say, "Hey, my team needs these seven packages today." So, I'm going to build a spec that has these seven packages, and all their workstations today have these seven packages in it. And then, the next day they can be like, "Oh, I want to upgrade this one. Delete that spec. Give me a new spec that has these packages."

And if they're like, "Oh, hey, I need a package that we don't have available yet in our list." Either they can request it from someone on our team, or we've built this in a way to allow them to build it themselves. So, they could say, "Oh, hey, I have this software installer. I want to build a package that puts this software installer in. This is a new compositing software. We want to try it out. Let's put that in." So, it's allowing them to build up and add whatever they want as well, and they just come see us if they have a problem. Kind of that self-service development model is really where we're going towards. And what we really like is, we're a platform, you decide for your team what you want on it. And let us know if you have any questions, or you need more features for that.

So, it's usually kind of on a team level that people decide what they need, but you mentioned individual settings. And that is actually a big challenge for us. Because when you just have a desktop, you use all this different type of software, and it just saves stuff on your computer, and you don't even have to think about it. It just saves it. Right? You just have a hard drive. But in cloud computing, there's this idea that workstation is ephemeral, right? It could disappear on you, because you're not controlling it. Right? When you're building a website, that's the reason why you have that horizontal scaling. "Oh, I have a hundred computers serving my website. So, even if 20 of them go down, my website's still going to be up." Right? But this is a different use case, where it's an individual person, and they need their work to always be available. So, we try to make very stable workstations that are available as long as possible, but there's always that chance you could lose it.

Saving in digital settings was really important to us. And we had to think about that, as opposed to a computer, where you don't have to think about it. So, you have to attach external storage, you have to decide what gets saved on there, and you have to attach it to a user. So, they build a workstation today, and they build a different workstation tomorrow cause they went different software. Those user settings travel with them. Those reference files they've saved travel with them. So, that was definitely one of the challenges that we had to think about and do, that you didn't have to think about before. So, there's obviously pros and cons to remote versus stable workstations, and we think the pros are important enough to kind of solve these type of problems.

Thomas Betts: So, those individual settings go with them to any workstation they go to? So, if one computer had just Photoshop installed, but then the next one has something else, their personalization comes with them as part of their user account? Or does it go for that type of instance of the workstation?

Michelle Brenner: We want users to be able to save their data no matter where they go. So, if they're using a different type of software today versus tomorrow, whatever they've saved, because it's not just Photoshop settings, right? It's, "I save some cool reference images that I want to see later in the day, no matter what computer I'm on." Or, "I've saved a Python script that I wrote a little five lines that I like to use on everything, and loaded it to my instance." So, we wanted to have that flexibility to match the way people were already working, which is they're used to having a home directory on their computer that they could put whatever they want in it. And they know it's always going to be there.

File Security and Backups [15:58]

Thomas Betts: Yes. That makes sense. The other part with the files, you mentioned early that security is a major concern. I know people would love to get their hands on the next episode of Stranger Things, or Umbrella Academy, or whatever's being worked on. Right? How does that change? And is the authorization something your team worked on, or was that another solution that was solved and you just had to integrate with?

Michelle Brenner: Well, one of the great things about working in a big company is that there are specialized teams you can kind of lean on for that type of thing. So, we leaned heavily on teams like security, like no cloud networking, and all those other things so that we didn't have to build every piece of the puzzle. But, it was definitely number one in my mind.

Since I've been in entertainment so long, security is just this big flashing light bulb that goes in my head anytime I build anything. It's like, if you lose those files, people are going to be very upset, and it's going to ruin the experience for the creators and for the users, right? You don't want people seeing unfinished work. So, that was just like a critical portion for us, of, "Hey, let's make sure the right users are on the right workstations and see the right files. And there's no leaking into any other things." And it's just nice that that came with the cloud, whereas, if I'm putting this on this cloud storage and the artists are remoting in and using it on that storage, they never have to download it onto their computer, which is really nice.

Thomas Betts: And I assume that also then takes care of any backup issues. So, having to make sure that those files don't get lost, you don't have them just sitting on someone's hard drive on their desktop at home. They're always in the cloud, because that's the only place they exist.

Michelle Brenner: Yes, exactly. So, we could build those automated systems that make copies of things, and we can have complete control of it. And we don't have to worry that someone's working on that. They downloaded it on the computer at home, they did some changes, and then they're like, "All right, well I'm going to go take a break." And then if their computer melts down and they lost their day's work, we have it all replicated and ready to go. So, if something happens, it's already available for them.

Measuring Customer Satisfaction[17:43]

Thomas Betts: How do you know that this is working well for your end users? Do you just ask them, or do you have observability built in somehow, that you can watch and tell that things are going well?

Michelle Brenner: I feel like with observability, you could always have more, right? There's always more information you can gather about how people are using it, how long they're using it, are they using it and then abandoning it? How's the network going? How's the storage? Are people using it? But, the biggest thing for us is talking to the users and seeing if they're happy. Because we want, with any product, you want your users to be delighted. And if they're not delighted, it's a bad day for you and them as alike.

I don't think people are afraid to tell us when things go wrong. You get that immediate feedback. Our entire team is on call on a pretty fast rotation, so that we're all getting that feedback very quickly. And we also reach out and say, "How's things going for you? Are you using this well? How are you using it? Is there different ways that we could try to do better?" And so, we're constantly iterating on that. And getting that feedback directly from both individual users and kind of those Technical Directors and Pipeline Engineers, who are kind of our super users, and are constantly using all the different parts of the application. Being like, "Hey, this could be better. We actually want to use it like this, that sort of thing."

Testing [18:55]

Thomas Betts: I wonder about testing. How important is testing for the whole stack that you're doing, and how is it handled? If something goes wrong, it seems like you could potentially prevent hundreds of people from being able to even turn on their computer that day. So, you got to make sure that it's a stable system.

Michelle Brenner: Yes. That is definitely the fear, right? If people can't get to work, it's a very bad day for them, and it's a very bad day for us. So, people work from home a lot faster than we expected. Kind of compressed our timeline, right? So, we need to get people up and running quickly. And one of those things that happens with engineers is, you think, "Oh, we can push testing and observability to later." And then, that's when being on call really highlights that problem, right? Things are going down. And we have users all over the world. So, I've answered on call 1, 2, 3 in the morning, and that really pushes you. "Okay, we got to get this testing in."

We've built this whole testing framework that constantly runs and is constantly pretending to be a user, right? It constantly brings up different workstations, and logs into them, and pokes around with them, and then just logs off, so that we know immediately if things are going wrong. And recently, in the new services, we've built in things like, "Okay, I'm making a change before I deploy it. I want to go through the entire system. I want to bake a new image. I want that image to build the workstation. I want a user to log into that workstation and touch things."

So, that builds that confidence that things are going right. And that obviously slows down deployments, right? It used to be I could deploy in seconds. Now it's going to take a little while, right? But that builds that confidence, that it's not breaking things as people are working, which is really important to us. I think that's a very common activity of mature products, right? I can break things for a while, and then people will just get sick of you, and they're not going to use it anymore. So, if you have people actually using it every day, you have to build in that gate keeping, so that when you deploy things, you feel confident about it. And then, people are happy with what your doing.

That actually just happened. Super recently, I had this new little bit of software that I was deploying, and every time someone used it, they were always using latest. And someone managed to slip in, in between when I did a deploy, and then fixed 10 minutes later, and it broke for them. And it was just really not a great solution for them. So, I was like, "All right, guess what? I've tagged all the releases, and now we're going to have release notes, and you're not going to use the new release until you're happy with it. And we're both happy with it." So, it's those sort of things that just come along with a mature product that I think is very common to whatever anyone is doing. And it's just the growing pains of any product, which is what we're trying to address.

Trade-offs [21:20]

Thomas Betts: I always like hearing very clearly called out the trade-offs that you have to make in doing any sort of design and architecture and solution. So, you said, "Oh, we could deploy faster, but it didn't have the confidence. And we favored a little bit more confidence and a little slower deployment process because then we know we don't break our users." And so, that's clearly one of those things, just like security was a big concern. You weren't going to just let everything go out there willy-nilly with no security day one.

Michelle Brenner: Yes. And, those decisions come up in big ways and small ways, right? So, I created a bunch of tests, and the Linux tests were so much faster than the Windows tests. I didn't turn the Windows tests on for deployment. And then of course, I broke Windows. And then, I just spent two days trying to figure out what change I had done that had broken Windows. And I was like, "What did I do?" And after that, I was like, "Okay, well I'm going to run the Windows test on every deployment, because it actually took more time for me to figure out what change it was than if I had done the Windows test at every deployment. I would've seen which change had broken it, and it wouldn't have actually deployed."

And it's those decisions that just kind of keep happening over and over again. I'm like, "Okay, great." Well now it takes... Even if I'm doing a change that I think only affects a small area, let me do all the integration tests. Because then, it'll help me solve that problem later, right? I won't spend two days digging through my code trying to figure out the change. So, it's always those trade-offs that happen, whether you're doing a little change, or the big architecture deployment for everyone.

Only Windows and Linux Workstations, for now [22:47]

Thomas Betts: Since you mentioned Linux and Windows, do people get Mac workstations as well?

Michelle Brenner: Not yet, but it is something we're exploring. In the news, right? They're becoming more available on AWS, so it's something we're looking into. It's basically like, "What are people using and what they're used to?" In my experience in entertainment, it's been a lot Linux. I used Linux for most of my career, and just that. And people are used to using that.

But then, there's some software that only comes on different operating systems. So, we want to make it whatever people need, available to them and what they're used to. So, people come to me tomorrow and say, "Hey, this software I can only use on a Mac." I'm like, "All right, let's explore what our options are. What are the instances that we can get? What can we give you? What will be useful to you? And is this the game changer for you? Is this software only available on the Mac, or can you use it just as well on a different, more supported operating system?

Peripheral Support [23:37]

Thomas Betts: You briefly mentioned peripherals like Wacom tablets, or "Way-com" tablets. I always forget how to pronounce that. And large monitors and being able to support that. Are there any specific challenges with getting those to work on remote workstations? It's not just plug-and-play. It's not that simple, but is it that simple?

Michelle Brenner: Well, it's important for us to work with our partners at Teradici and NICE DCV to get those to work. Those are our number ones, right? If we give them a list of things like, "These have to work: Wacom tablets, to dual monitors." They're at the top of our list, while things like a printer is not at the top of our list. In fact, please don't print out any images. We don't want you to. That would actually be bad.

So, it's just prioritizing things we know people want to use, and working with the different teams and saying, "Hey, this is what our artists need. We got to get this right or they're not going to be able to use this at all." A lot of artists will even just use peripherals like a Wacom tablet over a mouse, right? They won't even touch their mouse through the day because they're so used to using the Wacom tablet. So, we can't change their workflow because that won't create the art that we need. That's actually making things worse for them. So, it's important for us to say, "Hey, this is the table stakes. If they can't remote in and use this Wacom tablet, they can't use our system. And we need to ship them a computer until they can. So, we need to get this working right away."

Accelerated Project Schedule [24:49]

Thomas Betts: And you mentioned that working from home compressed the timeline, but you were always planning, you said, this started in 2019. This was a desire for Netflix to be able to bring in creators wherever they were. It was just to be able to get access to more creators to help you produce, you said, a movie a week.

Michelle Brenner: Yes, exactly. I joined this team in December 2019, before a lot of people started working from home. It was our plan to do a very slow roll out. "Okay, we'll try this team, who's in this region of the country. And then we're going to roll out to another region and those people are going to use it."

And instead it was, "Okay, well everyone's working from home. Let's see what we can get them as fast as possible, and give them a variety of things. And then we'll slowly improve, and make that experience better for them, instead of doing the slow roll out, kind of like one team at a time." So, it changed our strategy, but it didn't change our end goal. Our end goal from 2019 was always let's get remote workstations to every person around the globe that we want to work on a Netflix original. And it just changed the timeline and the strategy, but that's still our number one goal. We want to get as many people be able to work wherever they are.

Dealing With Bandwidth and Latency Constraints [25:57]

Thomas Betts: Yeah. And I think the way you said it was that Netflix is doing, just the level of content being created. The number of movies and TV shows is different from other studios. And you said you've worked in the entertainment industry for a while, so you may know. Was it just easier to wait for people to show up, or you could handle shipping a computer to someone and dealing with the security issues if it was just one or two people, as opposed to I'm perceiving dozens or hundreds?

Michelle Brenner: Exactly. So, I've worked at Sony and Technicolor. In the past, it's always been only a few projects at a time. And most of it is in studio, right? You just stop the security at the door. You just couldn't access things from home or bring a laptop with you. You just had a really beefy box in the studio office. And some of the remote things we tried was more, "Okay, we want to have an office here, an office in another region, an office somewhere else, but it was always an office." That's all my previous experiences. You bring people in to a central location, and have your own data center there, and have the on-prem solutions for your artists. And if they're working remote, it's just remote to another office that you completely control, but you could bring people into different offices.

And if you want someone to work and send you files, they're sending you files over FTP or they're mailing you a hard drive. I've plugged in a lot of hard drives. It's like sometimes that's actually faster. Hopefully things have changed a little bit now, but it used to be that the amount of data we were sending, it could be actually faster to put it on a hard drive and fly it on a plane than to actually send it over a network. And there's always the challenges too, that where people are working, maybe they don't have the strong network connectivity. Are they working on set in the middle of nowhere, getting these beautiful shots, but not somewhere where wifi has a strong presence? So, dealing with all those challenges meant that you were kind of hedged into those offices. And now, it's like, "Okay, instead of working on one or two movies, we want to build that scale and use that technology."

That's part of the reason why I joined Netflix. Whereas, this is an even bigger challenge for me. So, I've dealt with these challenges of, "Okay, how to make a giant movie, and deal with security, and tons of data, and these network issues, and wrangling all these different pipelines." That's really fun, but I've done this for a couple years. And then, Netflix came in and was like, "Well, how would you like to do it at this unprecedented scale?"

And I was like, "Oh, it presents these huge challenges!" And Netflix has obviously done that on the streaming side with the 200 million streamers all over the world. And they've developed all these really cool, interesting technologies. And how can we leverage some of that for the artistic side? And we've definitely done that, in terms of, I get to rely on all these great engineers that already worked there and have all these challenges and say, "Oh, what tools have you used that we can repurpose?" Right?

So, as a great example is Spinnaker. Spinnaker is this open-source tool that you can use for deployments. That's usually what it's used for. And we actually repurposed it to use for fleet management. And say, to control these pools of workstations and what's on them using the tools that are already built and using it in a new way. So, you don't have to kind of build things from scratch. I hate building from scratch. I love repurposing, because it's, for me, it's all about getting to that end goal. And the more you can rely on these kind of tested technologies, that a lot of cool engineers have already used, the faster you can get cool things for other people.

Thomas Betts: Yeah. And I think, going back to your comment about shipping hard drives, I think it was an XKCD or a "what if" that was, what is the data transfer rate of FedEx? And so, for sufficiently large sets of data, it is faster to physically move molecules instead of just electrons. But it seems like with the fully cloud solution, you're never shipping that data back and forth to the user's machine at home that they're connected to. The workstation's in the cloud. The data's in the cloud. It sort of solves that bandwidth issue. You just have to have a fast enough bandwidth to be remote desk-topped onto that workstation, right?

Michelle Brenner: Exactly. And it's all about getting it into that cloud ecosystem, because once it's in that ecosystem, it's about making it available in different regions, right? So, we're slowly building out all the different regions that we support, because that helps with the latency, right? Getting that data there. But that's more of a data move problem than an individual problem. An individual problem is someone's saying, "Okay, I need to get these files there in London. I'm in Singapore. How do I get them?"

Versus, "Okay, it's already on the cloud. Let's automatically move this data to this different region, because we know we're going to have artists at that region that need this, and then it's already ready to go. And it's not an individual's problem to, "How do we shift these files around?" It's more like us. "What are the regions we support? Let's make all the data available to be picked up in all the regions we support so people can just get to work."

Next Steps - Self-Service Platform [30:35]

Thomas Betts: I think just to wrap up, I want to know what's next for this. It was started as a platform for creators. Are there other use cases and other user groups that you think are going to be benefiting from this within Netflix?

Michelle Brenner: I touched on this a little bit, the kind of self-service user platform they want to build in it. So, it actually started as like a more fully-formed product, where we built a UI, and we built a lot of these packages, and did all the configuration. And we kind of white glove the whole service just to get us up and running as fast as possible.

So, our plan now is to bring other people in, and have it be a much stronger platform, and bring it out. I mentioned regions, building out to different regions, building out to different operating systems. If people want a Mac now, not just Linux or Windows. And building out that really stable platform, and building those APIs on top of it. So, other people can build whatever system they want. They want to automate, say bringing the files, and automatically building a workstation that has it. And this is the software that has it. And these are the users that have access to that workstation. Instead of people having to come to us and have us build a whole product, leveraging as a platform so people can build on top of it.

I really love the idea of enabling other engineers to build. It kind of expands the reach of what you could do. We're always going to focus on creatives, but creatives are enough of a big use case for us to build all sorts of really interesting things that can serve all different types of creatives. So, it's about being able to build things for the different types of artists we want to expand on, and building tools so that the other engineers can build for whatever use cases they find.

I want to do one last plug. If this sounds really interesting for you, please reach out to me. You can reach out to me on LinkedIn. Just give me a message so I know you're not a robot. And then, there's plenty of room on the team to grow and we'd love to have you.

Thomas Betts: Well, that sounds great. I want to thank Michelle Brenner once again for joining me today.

Michelle Brenner: Thank you so much for having me.

Learn how to solve complex software engineering and leadership challenges. Attend in-person at QCon London, (April 4-6) or attend online at QCon Plus (May 10-20).

QCon brings together the world's most innovative senior software engineers across multiple domains to share their real-world implementation of emerging trends and practices. Find practical inspiration (not product pitches) from software leaders deep in the trenches creating software, scaling architectures and fine-tuning their technical leadership to help you make the right decisions. Save your spot now!

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT