BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs

This item in japanese

Bookmarks

Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. The company considered different storage technologies and decided to combine the relational database and the object store to strike a balance between querybility and scalability while keeping the costs down.

Zendesk created the persistence solution for event-stream data using DynamoDB for storage. The initial design was working well, but the solution became more and more expensive to operate. The team switched to a provisioned billing model and reduced costs by 50% but with the growth of the customer base and the need for Global Secondary Indexes (GSI) to support new query patterns, the costs of running the architecture became unsustainable.

The Initial Architecture Using DynamoDB (Source: Zendesk Engineering Blog)

As Zendesk is running its platform on AWS, the team was looking for alternative storage solutions that could meet functional and technical requirements while reducing costs. They considered S3, Hudi (the datalake used at Zendesk), ElasticSearch, and MySQL, but discounted Hudi because of its complexity and the 24-hour delay and ElasticSearch due to costs similar to using DynamoDB. In the end, the team decided to use MySQL to buffer logs from Apache Kafka and to store metadata and S3 to store raw data in batches of 10,000 per file.

The ingestion flow involves storing log data consumed from Kafka into the buffer table in MySQL. Every hour, a background job uploads new records from the buffer table into S3 in batches of 10,000 logs per file and inserts a metadata record per S3 file. Another hourly job deletes logs from the buffer table older than four hours.

The New Architecture Using MySQL (AuroraDB) and S3 (Source: Zendesk Engineering Blog)

To handle queries, the new solution requires a look-up in the MySQL metadata table, followed by a set of parallel S3-Select queries against files returned by the look-up. Since the data layout was optimized for chronological retrieval, the team experienced problems executing more complex queries.

Shane Hender, group tech lead at Zendesk, explains the challenges with flexible querying in the new architecture:

Once we had the above all working we encountered a performance problem when the client wanted to filter results by any field besides the timestamp. For example, if the client wanted logs for a specific user-id, then we’d at worst have to scan ALL the S3 data within the time-range to find any relevant logs, which made it difficult to figure out what queries to do in parallel.

Engineers have considered duplicating data in S3 to handle more filterable fields, but the approach wasn’t feasible given the number of field combinations. Eventually, they looked to Bloom Filters, further combined with Count-Min Sketch data structures, which together offered an effective way of supporting multi-field filter queries. The improved solution requires an additional table storing serialized data structures used to determine the S3 files to query.

After migrating, Zendesk reduced storage costs to less than 20% of provisioned DynamoDB costs, with MySQL (AuroraDB) accounting for over 90% and S3, together with S3-Select for less than 10%. The new solution offers query latencies of around 200-500 milliseconds, with some spikes into seconds, which the team is looking to optimize further.

About the Author

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Question related to requirements.

    by Richard Clayton,

    Your message is awaiting moderation. Thank you for participating in the discussion.


    Once we had the above all working we encountered a performance problem when the client wanted to filter results by any field besides the timestamp. For example if the client wanted logs for a specific user-id, then we’d at worst have to scan ALL the S3 data within the time-range to find any relevant logs, which made it difficult to figure out what queries to do in parallel.


    Was this not a required feature of the new design? I would expect the need for GSIs in DynamoDB to be due to the need to filter by different fields.

    Either way, thank you for detailing your migration off of DynamoDB. We have a similar scenario where we feel the development complexity and operational cost of using DynamoDB may not make sense (at least at our scale).

  • Why MySQL over postgres?

    by James Percy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Is there a specific reason you chose MySQL over postgres?
    Part of me wonders if postgres partitioning would have helped in this scenario?

  • Re: Question related to requirements.

    by Rafal Gancarz,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Richard, thanks for your comment.

    I presume the requirement was known from the outset, as you pointed out. The authors of the blog post are most likely explaining their design process.

    Like with any other technical decision, it is all about the context, and I've chosen to cover this use case because I expect many other organizations will be facing similar challenges. The Zendesk team has described a reasonably complete solution to their problem, so I hope at least some of the ideas will be applicable to others.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT