Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News How Honeycomb Used Serverless to Speed up Their Servers: Jessica Kerr at QCon San Francisco 2022

How Honeycomb Used Serverless to Speed up Their Servers: Jessica Kerr at QCon San Francisco 2022

At QCon San Francisco 2022, Jessica Kerr, principal Developer evangelist at Honeycomb, presented ​​Honeycomb: How We Used Serverless to Speed Up Our Servers. This talk is part of the editorial track Architectures You've Always Wondered About.

Kerr began her talk by recreating a routine morning for a software engineer who failed to investigate abnormalities on a metric dashboard after brewing their morning coffee. She drew the audience’s attention to the importance of the responsiveness of the observability tools and why Honeycomb’s architecture decisions are anchored around providing a responsive, highly interactive, and flexible experience.

To achieve this vision, Honeycomb created "Retriever", a special-purpose datastore for real-time event aggregation for interactive querying over telemetry data. Retriever is a distributed datastore that writes to local disks. A query request is routed to one lead retriever, and the lead retriever sends consequential inner queries to the appropriate retrievers who have the data in their local disks.

Retriever is also a distributed column store where each column is stored in a separate file. Data are further broken into segments by duration and size. The earliest and latest timestamps are recorded for each segment. This allows honeycomb dynamically aggregate any fields across any time range by reading the necessary data from segments that overlap the requested duration.

As they started to have bigger and more clients, Honeycomb started to send data to AWS S3 buckets. But this change made the honeycomb’s response time exponentially longer. Queries that requested a large amount of data would run as long as 60 minutes, which is, in coffee time, toast your beans and brew a fresh cup long - far above Honeycomb’s standards. 

Through analyzing the compute demand profile from Honeycomb’s workload, they learned that the required compute power is super spiky and short-lived. This is very different from the compute profile that EC2 could offer, but this is precisely what AWS Lambda provides. To top it off, Lambda is also "next door" to S3. 

With the addition of Lambda, now the architecture for Retriever looks like the photo above. Whenever the Retriever needs to access data in S3, for every 8 or so segments, a lambda function gets spun up and retrieves the data from S3, performs the aggregation, and sends the intermediate results back to the Retriever.

The result is that Honeycomb was able to achieve very sub-linear performance results. Reusing the coffee metaphor, Kerr goes on and says, "Now, if you have to perform a 60-day query, you are probably gonna get a sip in, but you definitely don't have to go get another cup"

Honeycomb observed a 50ms median startup time in their lambda functions, with very little difference between hot and cold startups. They tend to (90%) return a response within 2.5 seconds. They are 3 - 4 times more expensive but much more infrequent than EC2s for the same amount of compute. 

In the two-year journey of adopting this Lambda usage, there are a few considerations that Honeycomb learned. First of all, it’s service limits. By default, AWS had a burst limit that would scale linearly in steps to the concurrency limit only if there were sustained loads. This wasn’t helpful to Honeycomb because their usage patterns don’t have sustained load. They were able to raise their burst limit after talking to their AWS account representatives.


Honeycomb has done several other optimizations, such as adjusting the default timeout limit, migrating to Lambda ARM, language-level optimization, and using a more performant compression library.

Kerr closes out her talk by summarizing the work that you should do if you want to adopt serverless in your architecture:

  • Use Lambda for real-time bulk workloads that are urgent

  • Make data accessible in the cloud and divide them into parallel workloads

  • Before scaling out, tune and optimize properly, use observability layers, and measure (especially cost) carefully

  • Last but not least, architecture doesn’t matter unless users are happy.

Other talks on Architectures You've Always Wondered About will be recorded and made available on InfoQ over the coming months. The next QCon conference is the QCon plus online.

Kerr has previously been interviewed by Charles Humble, and the interview and the transcript are available online on InfoQ.

About the Author

Rate this Article