Bio Yao Yue is the tech lead and manager of the Cache Team at Twitter and has spent the past few years maintaining, designing and implementing various cache-related projects. She maintains several open source projects, including Twemcache and Twemproxy. In her spare time, Yao has been introducing programming to middle/high school girls via drawing and games. She is also a baker.
CRAFT is about software craftsmanship, presenting which tools, methods, and practices should be part of the toolbox of a modern developer and company, and serving as compass on new technologies, trends.
My name is Yao. I'm an engineer at Twitter and I've been there for about six years. All this time, I have been working on cache so you can call me sort of the residential expert on cache at Twitter.
At Twitter, we really use cache very heavily and we care about the quality of the infrastructure that we maintain, so in the past couple of years, we had this opportunity of rethinking the problem of how caching should be done in data centers and we created this project called Pelikan that helps us work towards this goal and the talk mostly is about the practice and the stories that happened while we did this information and eventually open sourced it.
There are very well known projects that work on in-memory caching for example. Memcached is probably the most widely knew and also Redis is a very famous one that can be used as cache. And when we look at these existing solutions, we notice there's a lot of similarities of what they do and how the code is structured and we think it's interesting to sort of treat caching as a generic problem. You have certain goals, you want data in certain places you want to access them with certain transport.
So this is our effort to formalize the problem as what caching is in data centers and this is our framework of saying regardless of what product we want to use or what kind of data structure you want to implement, you can work on that within a generic framework. So Pelikan cache is our effort to unify all these cache offerings, give them a definition, give them a description, give the same architecture and the same look but we can continue to use existing protocols as well as adding new features and new protocols to it.
That will be a yes and no kind of answer. So we certainly learned a tremendous amount by operating these caches. We also forked them by applying the kind of patches that we think are helpful. And through this experience, a more generic design sort of emerges. We have that idea of how to abstract different layers.
So by design, we actually come up with a clean slate architecture saying this is how we want cache to be structured and in practice, we shamelessly steal code from existing projects because those are very well tested and very good code. So we fit existing implementations in small pieces into the overall design that we have so it ends up being more of a hybrid system where it's in its own code base, it has its own design but if you look carefully at the details, you will see a lot of familiar code that you will find in other open source projects.
The current components we have built are the equivalent of say, Redis server or memcached so they are more or less standalone cache servers. We do have plans of building some other components, for example, some people have been using Twemproxy which allows a number of these cache backends or cache servers to be organized as a cluster, so those are our future road maps but so far, what you will get are stand-alone cache servers.
We aim to replace all the existing instances of Twemcache which is what we run in place of memcached at Twitter. It's very similar to memcached and all the existing instances of Redis which we also use very heavily with Pelikan. So basically, this allows us to continue to use the existing design and just swap out the backend so we can maintain basically two different protocols or two different services from a single code base which reduces the amount of technical debt and also the operational burden that we have to take.
Werner's full question: Memcached and Redis are quite popular. So what are some of the key differences in Pelikan to these others? What are you doing better?
We really set out to find the kind of things we want to improve from existing projects because these are very successful projects. There's a lot of merits and of course I'm not going to reiterate what they're good at because I think that's well known but through operations, we actually find certain things that are not quite want we want.
For example, logging is done in a very trivial way in these systems. Like if you want to write a log, you actually call the system call write() and sometimes, very occasionally, that slows the entire system down. But people have very stringent requirements of how their cache servers should respond. So we actually rewrite the logging logic to never touch the disk directly so these are the sort of things we improve, we also improve for example stats, metric collection, we also improved how the system is composed so for us, writing a new protocol is just a matter of a couple of thousand lines and they fit nicely into the rest of the system.
So basically I guess our improvement first is it's more operations-friendly at a large scale and also structurally it allows us to improve and add future features with very little change to its existing code base.
I think a lot of this operations pain like determinism in the runtime behavior are a lot more prominent when you have scale. So if you have two instances and the load is not very high, some of those events that happen are so rare that it doesn't really matter too much. But if we have tens of thousands of instances you get into all kinds of weird situations. So to be able to maintain a very large scale operation with minimum engineering effort we need unlikely events to be even less unlikely so I think you will see the problems we see more if you have the same kind of scale or larger, on the other hand I think small operations will feel less urgent need to adopt.
On the other hand, it's always nice to have something that works exactly as you want it so there's basically no downside if this applies to you to use the system that we built.
That's a latency problem, yes. So in the system that has very stringent latency requirement, you want to eliminate anything that's not deterministic, right? Write() is one that's mostly very fast but occasionally slow. Even things like malloc, people who would argue as extremely reliable in most cases there are situations where a malloc can take a while, right? And these are unlikely events that we have to take care of.
Well, if you have another process running on the same hardware which you have no knowledge of that tries to read or write large chunks of data from disk, there's contention at the disk level and this is something that as a cache server, has no control over because it shares the same hardware.
So basically, it's not a correctness problem, it's just something -- anything you cannot deterministically the control, it's something that should be moved out of your fast path. So we would like to think about in networking terms, there's the data plane or the fast path and there's the control plane or the slow path. We want anything that should be fast to stay on the fast path. Anything that could slow the fast path should be moved to the control plane.
Werner's full question: That makes sense, yes. So Pelikan I think is written in C, right? So you don't have to worry about the big nasty garbage collector.
No. So we have -- there's this running joke say why don't you write -- because Twitter is a Scala shop right? Why don't you write the cache in Scala but I don't think anybody is serious about that because we all know the strengths and weaknesses of low level languages and C seems to be a good fit for this type of task.
Werner: But still malloc as you say, can occasionally trip you up as well.
Yes. So we deal it. We deal with it by pre-allocating as much as we could so if we allow a thousand connections, why don't we just allocate a bunch of buffers for this connection and keep reusing them? So these are things that are not particularly hard to do, we just need to be aware of the impact of doing or not doing them.
Yes. So I can tell you that kind of SLA that we promise to our customers, so when we do load testing, not only do we test throughput, we also test tail latencies and the kind of SLA we come up with are usually we can support say, 300,000 request per second per instance under the constraint of having the latency be under one millisecond in our data center environment.
We do try to be very fast and there's a reason for that because at Twitter, lots of customers because we're a micro service architecture so all these individual services could use cache so we have a lot of internal customers and when you tell them how your cache cluster should perform, you want to give them a very easy answer that they can understand instead of saying “It's mostly fast but sometimes it's slow” and then when their service becomes slow, this becomes the suspect. Is it because my service is slow or is it because my cache is slow?
So after answering that kind of question, maybe a dozen times or two dozen times, you will come up with the idea that maybe I should guarantee them it's really never slow. So we're not on the hook for debugging people's systems.
Werner: So you can always say “It's your fault”.
Exactly. That's a tremendously advantageous position to be in and we can show people that their cache is always fast so they can go somewhere else and actually narrow down the cope of debugging more quickly.
Werner: Okay. That's a very good strategy.
Well, that's sort of the one thing you learn very quickly if you have to support a very large scale operation with lots of customers, you try to get everything under control.
Werner's full question: Maybe one more question about memory allocation. Do you do dynamic memory allocation? Do you have your own malloc or do you use a special malloc?
So it's controllable, it's a configuration option to prealloc so if you want to do allocation on demand, you can totally configure the system to do so. If you want to pre-allocate because you have a good idea about the demand, you can also do that. You can also control how much buffer you want to pre-allocate. So this gives us flexibility of catering to different use cases.
Right. So we haven't got to the point where we sort of benchmark across different malloc solutions largely because if we do pre-allocate, the difference becomes less important. But we do have wrappers around malloc so instead of calling malloc directly, we actually have a very small -- it’s a macro and you can plug in your own implementation -- any implementation you want.
One interesting thing about this very lightweight wrapping is it actually can help us catch bugs. It's not the case for malloc but a lot of bugs actually can happen when you call realloc because realloc usually just extends or shrinks the pointer in place. It returns the same address but occasionally, if you cannot fit in at the original location, it will move your pointer to a different location and these are the kind of bugs that unless you can reliably reproduce the behavior is very hard to catch and we actually saw that in production.
So if you have a wrapper, you can say I'm going to implement my realloc for testing so that it always changes the address of the return pointer then you can cache that kind of nondeterministic bugs very quickly which you cannot really do if you just called the libc interface directly.
Werner: Interesting. So it's kind of a mocking strategy.
Yes. That's correct.
So we have this project homepage called pelikan.io. We also have the Pelikan GitHub repo that's part of the Twitter organization. So if you go to github.com/twitter/pelikan, you can find the source code.
Werner: Well, that's very interesting. I think our audience has homework to check out your work. And thank you.
Thank you. And I'm very happy to hear feedback about the quality and the design of the project if people have comments.