Parse Got a Tenfold Reliability Improvement Moving from Ruby to Go

In order to improve scalability, Parse moved part of their services, including their API, from Ruby on Rails to Go, Charity Majors, Engineer at Parse, recounts. In doing so, both their reliability and deployment times benefited greatly.

Parse started in 2011 as a Ruby on Rails project. This choice made it possible for a small group of engineers to implement the first version of Parse quickly, says Majors. However, as Parse grew, their architecture started showing a few inconveniences:

the growth in Parse code base made deployments, and rollbacks, lengthy; furthermore their HTTP server, Unicorn, stopped restarting gracefully. The end effect was more and more monkeypatching.
Rails’ “one process per request” model started to “fall apart” when the API traffic and number of apps grew. In fact, Rails worker pool easily filled up when many slow request came in, due to auto-scaling the worker pool size not being fast enough. It also happened that many of those workers were mostly just waiting on some external services.

Those issues led Parse engineers to realize they needed to move to a real async model, thus leaving behind Rails’ “one process per request” model. Since Ruby was not deemed fit for the task due to the vast majority of gems not being asynchronous and many times not thread safe either, they set up to choose a different language. The alternatives they took into considerations were the following:

JRuby, which was ruled out because, although the JVM is able to handle massive concurrency, Ruby still would have the problem of asynchronous library support.
C++, which was ruled out due to its being less productive than other languages and lacking of library support for things like HTTP request handling, or async operations.
C#, which was deemed a very good option for its async/await async model, but ruled out because “C# development on Linux always felt like a second-class citizen”.
Go, finally, was considered the best available option thanks to its native support for asynchronous operation, best-in-class support for MongoDB, and coroutines.

Parse engineer sort of tested out their choice by implementing their EventMachine in Go. The result was going from 250k connections per node to 1.5 millions. From there, they went on rewriting their services one by one, including the core API, all the while guaranteeing backward compatibility.

At the end of the process, the results were worth the effort, according to Majors:

Parse “reliability improved by an order of magnitude”.
The API stopped becoming more and more fragile with the number of spun up databases.
The time taken by the full integration test suite dropped from 25 to 2 minutes.
Full API server deploy dropped from 30 to 3 minutes.
The API server can finally be restarted gracefully.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Performance & Scalability topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter