BT

How Coinbase Handled Scaling Challenges on Their Cryptocurrency Trading Platform

| by Hrishikesh Barua Follow 16 Followers on Aug 12, 2018. Estimated reading time: 2 minutes |

Coinbase, a digital currency exchange, faced scaling challenges on their platform during the 2017 cryptocurrency boom. The engineering team focused on upgrading and optimizing MongoDB, using traffic segregation to resolve hotspots, and building capture and replay tools to prepare for future surges.

Customer traffic spiked to higher than the anticipated levels at Coinbase during May-June 2017, exceeding five times the typical maximum traffic and causing downtime. The team attacked the easy issues first - vertical scaling, upgrading MongoDB for performance improvements, index optimization and traffic segregation based on hotspots. The existing monitoring system was not enough to identify contextual information, so it was augmented with code instrumentation that logged the missing data. Even with these improvements, during the December 2017 Bitcoin price surge, Coinbase faced multiple outages again. The team has since focused on ensuring their systems can handle higher amounts of traffic by emulating traffic patterns with capture and replay tools.

Both Coinbase’s Ruby app and MongoDB experienced higher latencies during the initial outages, with the time split equally between Ruby and MongoDB. To better understand the context of these calls across different components, the team logged additional data by modifying MongoDB's database driver. This helped them narrow down the issue to an unoptimized response object which increased the network load. Fixing this issue gave the application a performance boost. Additionally, large read throughputs were implemented by adding caching based on Memcached at the Object Relational Mapping (ORM) layer as well as in the driver layer. Adding missing indices also improved the response times. By June 2017, the team had upgraded their MongoDB clusters to 3.2 which had the faster WiredTiger storage engine. Coinbase uses Redis to implement services like rate limiting, which were affected due to Redis's single threaded model during these outages.

To prepare for future surges in traffic, the team has worked on tools called Capture and Cannon that can capture traffic from production systems and replay it on demand against new systems to test their resilience. Capture and Cannon are both based on mongoreplay, which is a tool that can capture traffic to MongoDB instances from the network interface, and record the commands being invoked. This log can then be replayed against another MongoDB instance. Traffic is captured across multiple application servers and merged into a single file. The captured traffic as well as a disk snapshot is stored on AWS S3, from where Cannon plays it back later.

Coinbase maintains a public status page at https://status.coinbase.com/

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT