Preslav Le on How Dropbox Moved off AWS and What They Have Been Able to Do Since
As InfoQ previously reported in March 2016, Dropbox announced that they had migrated away from Amazon Web Services (AWS).
In this week's podcast Robert Bluman talks to Preslav Le. Preslav has been a software engineer at Dropbox for the past three years, contributing to various aspects of Dropbox’s infrastructure including traffic, performance and storage. He was part of the core oncall and storage oncall rotations, dealing with high emergency real world issues, from bad code pushes to complete datacenter outages.
- Dropbox migrated away from Amazon S3 to their own data centres to allow them to optimise for their specific use case.
- They are experimenting with Shingled Magnetic Recording (SMR) drives for primary storage to increase storage density. All writes go to an SSD cache and then get pushed asynchronously to the SMR disk.
- Their average block size is 1.6MB with a maximum block size of 4MB. Knowing this allows the team to tune their storage system.
- Three languages are used for the backend infrastructure. Python is used mainly for business logic, Go is the primary language used for heavy infrastructure services, and in some cases, for example where more direct control over memory is needed, Rust is also used.
- Dropbox invest very heavily in verification and automation. A verifier scans every byte on disk and checks that it matches the checksum in the index. Verification is also used to check that each box has the block keys it should have.
Dropbox’s motivation for moving off the cloud
- 2m:40s - Dropbox used Amazon S3 and other services where it made sense, but they stored all the metadata in their own data centres.
- 3m:30s - Initially this was done because Amazon had poor support for persistent storage at the time. This has since improved but it didn’t make sense for dropbox to move the metadata back.
- 4m:01s - By that time the dropbox team was ready to tackle the storage problem and built their own in-house replacement for S3, called Magic Pocket. Magic Pocket allowed Dropbox to move away from Amazon altogether.
- 4m:30s - The move saved money, but also allowed DropBox to optimise for their specific use case and be faster.
- 5m:04s - There is a cross-over point when you get to a certain scale where building custom hardware that is optimised for your use case allows you to save money.
- 5m:28s - It makes sense to build your own solution when it is a core part of your business
- 5m:57s - Dropbox is experimenting with cheaper Shingled Magnetic Recording (SMR) drives to increase storage density over the conventional Perpendicular Magnetic Recording (PMR) drives currently used.
- 6m:04s - The idea is that the read head can be smaller than the write head. You gain increased density but you can only write sequentially and writes are slower. The Dropbox team altered their software to accommodate this.
- 6m:25s - Likewise, they built their own chassis to fit as many hard drives in a single box as possible.
- 6m:34s - With 8 machines and 102 disks in each, the team is now facing the problem that they are so heavy that the floor cannot hold any more, so they are now needing to reinforce the floor.
Usage patterns in dropbox
- 8m:51s - When you store blocks they almost immediately get accused, but overtime they got colder and colder.
- 9m:10s - Their average block size is 1.6MB with a maximum block size of 4MB. Knowing this allows the team to tune their storage system.
- 9m:39s - To use SMR for primary storage, the team added an SSD cache. All writes go to the SSD and then get pushed asynchronously to the SMR disk.
The file system
- 10m:10s - Dropbox has its own file journal backed up by MySQL databases.
- 10m:33s - The journal uses an append only log so when a file is modified a new line is appended, and the old line is marked as deleted. This is how file version history is implemented.
Design goals and objectives
- 11m:15s - Dropbox needed to migrate all their users from S3 while the system was running.
- 11m:55s - The requirements were to be as reliable as Amazon, be as fast or faster, and be able to handle massive scale.
- 12m:20s - The project was a complete success. All users were moved and no one noticed!
- 13m:02s - You cannot migrate 500PB overnight. It takes months.
- 13m:30s - The team built a bulk file routing service so they knew where the data for each folder or each user was located, allowing them to switch between in-house and Amazon.
- 13m:54s - For a while they had the data in both places, but read it only from Magic Pocket. If disaster happened, they could switch back.
- 14m:12s - The day of the launch was the first day they deleted the first byte off Amazon.
- 14m:25s - The same routing service is now used to support European storage.
Rust and other programming language use
- 15m:27s - Dropbox uses three languages for the backend infrastructure. Python is used mainly for business logic, Go is the primary language used for heavy infrastructure services, and in some cases Rust is also used.
- 15m:57s - Rust is used for the disk storage service OSD, which is one of the most critical parts of Magic Pocket.
- 16m:13s - Go wasn’t a great fit for this because of garbage collection. The service is very memory-intensive, and over time increasing density from the use of SMR and bigger disks exacerbated the problem. Rust provides more direct control over memory.
- 16m:42s - Rust is also being used for the desktop client where again, very fine control over resources is important.
- 17m:29s - If you are running off the cloud, you need to do capacity planning to avoid needing to keep too much idle capacity.
- 17m:55s - Dropbox moved off Amazon because it made sense for them. If at any point it made sense to move back, they would do so. Cloud providers are still used for things like DNS and email.
Deployment and verification
- 18m:32s - When you build a storage service, what you are most afraid of is loosing data. So Dropbox replicate each block across different storage regions in different data centres and different geographic locations. But if you have the same data in two places and the same bug in both, it doesn’t help.
- 19m:02s - An API across the storage zones supports a staged release process and release one zone at a time.
- 19m:42s - Dropbox also invest very heavily in verification and automation.
- 20m:10s - There is verification at different layers of the stack. A verifier scans every byte on disk and checks that it matches the checksum in the index. Verification is also used to check that each box has the block keys it should have.
- 20m:29s - Blackbox testing is used to make sure that a block that was sent to a box is still there after a time interval (two days, two weeks, and so on).
- 21m:11s - Disks fail every day at this scale. When it happens, the system will re-replicate the data from somewhere else.
- 22m:02s - The team will also deliberately corrupt something to make sure the verifiers work.
- 23m:00s - Dropbox has tooling for deployment, replacing databases, hosts and so on. They don’t use much open source software for this.
- 23m:37s - When you build a storage system, you need to understand the hardware more than you typically do as a software engineer. For example, you can saturate a switch or a router, and switches and drives have bugs especially when you work with new technology.
- 24m:40s - Data is replicated across hard drives from multiple vendors to minimise risk.
- 25m:07s - For newer technology like SMR, the team also works very closely with the manufacturer testing early versions of the firmware before it goes into production.
Languages and Platforms Mentioned
QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams. QCon takes place 7 times per year in London, New York, San Francisco, Sao Paolo, Beijing, Shanghai & Tokyo. QCon London is at its 11th Edition and will take place Mar 6-10, 2017. 100+ expert practitioner speakers, 1300+ attendees and 18 tracks will cover topics driving the evolution of software development today. Visit qconlondon.com to get more details.
More about our podcasts
You can keep up-to-date with the podcasts via our RSS feed, and they are available via SoundCloud and iTunes. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.