Reliable Auto-Scaling using Feedback Control
When deploying a server application to production, we need to decide on the number of active server instances to use. This is a difficult decision, because we usually do not know how many instances will be required to handle a given traffic load. As a consequence, one is forced to use more (possibly: significantly more) instances than actually required in order to be "on the safe side". Since servers cost money, this makes things unnecessarily expensive.
In fact, things are worse than that. Traffic is rarely constant throughout the day. If we deploy instances with peak traffic in mind, we basically guarantee that most the provisioned servers will be under-utilized most of the time. In particular in a cloud-based deployment scenario, where instances can come and go at any moment, we should be able to realize significant savings by having only as many instances active at any moment as are required to handle the momentary load.
One possible approach to this problem is to use a fixed schedule, in which we figure out (somehow) the required number of instances for each hour of the day. The difficulty is that such a fixed schedule can not handle random variations: if for some reason traffic is 10 percent higher today than yesterday, the schedule will not be capable of providing the additional instances that are required to handle the unexpected load. Similarly, if traffic peaks half an hour early, a system based on a fixed schedule will not be able to cope.
Instead of a fixed (time-based) schedule, we might consider a rule-based solution: for any given traffic intensity, we have a rule that specifies the number of server instances to use. This solution is more flexible than the time-based schedule, but it still requires us to know (ahead of time) how many servers are enough for each traffic load. And what happens when the nature of the traffic changes — as may happen, for example, if the fraction of long-running queries increases. The rule-based solution will not be able to respond properly.
Feedback control is a design paradigm that is fully capable of handling all these challenges. Feedback works by constantly monitoring some quality-of-service metric (such as the response time), and then making appropriate adjustments (such as adding or removing servers) if this metric deviates from its desired value. Because feedback bases its control actions on the actual behavior of the controlled system, it is capable of handling even unforeseen events (such as traffic that exceeds all expectations). Moreover, and in contrast to the rule-based solution sketched earlier, feedback control requires very little a-priori information about the controlled system. The reason is that feedback is truly self-correcting: because the quality-of-service metric is monitored constantly, any deviation from the desired value is spotted and corrected immediately; and this process is repeated as necessary. To put it simply: if the response time deteriorates, a feedback controller will simply activate additional instances, and if that does not help, it will add some more. That's all.
Feedback control has long been a standard method in mechanical and electrical engineering, but it does not seem to be used much as a design concept in software architecture. As a paradigm that specifically applies in situations of incomplete information and random variation, it is rather different than the deterministic, algorithmic solutions typical of computer science. Finally, although feedback control is conceptually simple, deploying an actual controller to a production environment requires knowledge and understanding of some practical "tricks" in order to work. In this article, we will introduce the concepts and point out some of the difficulties.
Nature of a Feedback Loop
The basic structure of a feedback loop is shown in the figure. On the right, we see the controlled system. Its "output" is the relevant quality-of-service metric. The value of this metric is continuously supplied to the controller, where it is compared to its desired value, which is supplied from the left. (The desired value of the system's output metric is referred to as the "setpoint".) Based on these two inputs (namely the desired and the actual value of the quality-of-service metric), the controller computes an appropriate control action for the controlled system. For instance, if the actual value of the response time is worse than the desired value, the control action might consist of the number of additional server instances to activate.
The figure shows the generic structure of all feedback loops. Its essential components are the controller and the controlled system. Information flows from the system's output via the return path to the controller, where it is compared to the "setpoint". Given these two inputs, the controller decides on an appropriate control action.
So, what does a controller actually do? How does it determine what action to take?
To answer this question, it helps to remember that the primary purpose of using feedback control is to minimize the deviation of the actual system output from the desired output. This deviation can be expressed as the "tracking error":
error = actual - desired
The controller can do anything it deems suitable to reduce this error. We have absolute freedom in designing the algorithm — but we will want to take knowledge of the controlled system into account.
Let's consider again the data center situation. We know that increasing the number of servers reduces the average response time. So, we can choose as control strategy to simply increment the number of active servers by one whenever the actual response time is worse than its desired value (and decrement the server count in the opposite case). But we can do better than that, because this algorithm does not take the magnitude of the error into account, only its sign. Surely, if the tracking error is large, we should make a larger adjustment than when the tracking error is small. In fact, it is common practice to let the control action be proportional to the tracking error:
action = k error
where k is some numerical value. With this choice of control algorithm, large deviations lead to large corrective actions, whereas small deviations lead to correspondingly smaller corrections. Both aspects are important: large actions are required in order to reduce large deviations quickly. But it is also important to let control actions become small if the error is small — only if we do this does the control loop ever settle down to a steady state. Otherwise, the behavior will always oscillate around the desired value: an effect we usually wish to avoid.
We said it earlier: there is considerable freedom in choosing a particular algorithm for the implementation of the feedback controller, but it is usually a good idea to keep it simple. The "magic" of feedback control lies in the loopback structure of the information flow, not so much in a particularly sophisticated controller. Feedback control incurs a more complicated system architecture in order to allow for a simpler controller.
One thing, however, is essential: the control action must be applied in the correct direction. In order to guarantee this, we need to have some understanding of the behavior of the controlled system. Usually, this is not a problem: we know that more servers mean better response times, and so on. But it is a crucial piece of information that we must have.
Thus far, our description of feedback control has been largely conceptual. However, when attempting to turn these high-level ideas into a concrete realization, some implementation details need to be settled. The most important of these concerns the magnitude of the control action that results from a tracking error of a given size. (If we use the formula given earlier, this amounts to choosing a value for the numerical constant k.)
The process of choosing specific values for the numerical constants in the controller implementation is known as controller "tuning". Controller tuning is the expression of an engineering trade-off: if we choose to make relatively small control actions, then the controller will respond slowly and tracking errors will persist for a long time. If, on the other hand, we choose to make rather large control actions, then the controller will respond much faster — but also be at risk of "over-correcting" and thereby incurring an error in the opposite direction. If we let the controller make even larger corrections, it is possible for the control loop to become unstable: if this happens, the controller tries to compensate each deviation with an ever increasing sequence of control actions, thus swinging wildly from one extreme to the other while increasing the magnitude of its actions all the time. Instability of this form is highly detrimental to smooth operations and therefore must be avoided. The challenge of controller tuning therefore amounts to finding control actions that are as large as possible without making the loop unstable.
A first rule of thumb in choosing the size of control actions is to work backwards: given a tracking error of a certain size, how large would a correction need to be to eliminate this error entirely? Remember that we do not need to know this information precisely — the self-correcting nature of feedback control assures that there is considerable tolerance in choosing values for the tuning parameters. But we do need to get at least the order of magnitude right. (In other words: to improve the average query response time by 0.1 seconds — do we need to add roughly one server, ten servers, or a hundred?)
Some systems are slow to respond to control actions. For instance, it may take several minutes before a newly requested (virtual) server instance is ready to receive incoming requests. If this is the case, then we must take this lag or delay into account: while the additional instances are being spun up the tracking error will persist, and we must prevent the controller from requesting further and further instances — otherwise, we will eventually have way too many active servers online! Systems that do not respond immediately pose specific challenges and require more care. However, systematic methods exist to "tune" such systems. (Basically, one first needs to understand the duration of the lag or delay; then one can use specialized plug-in formulas to obtain values for the tuning parameters.)
We must keep in mind that feedback control is a reactive control strategy: things must first go out of whack (at least a little), before any corrective action can take place. If this is not acceptable, then feedback control might not be suitable. In practice, this is usually not a problem: a well-tuned feedback controller will detect and respond even to very small deviations, and generally keep a system much closer to its desired behavior than a rule-based strategy or a human operator would.
A more serious concern is that no reactive control strategy is capable of handling disturbances that occur much faster than control actions can be applied. For instance, if it takes several minutes to bring additional server instances online, then we will not be able to respond to traffic spikes that build up within a few seconds or less. (At the same time, we will have no problem handling changes in traffic that build up over several minutes or hours.) If we need to handle very spiky loads, then we must either find a way to speed up control actions (for instance by having servers on hot standby), or by employing mechanisms that are not reactive (such as using message buffers).
Another question that deserves some consideration is the choice of the quality-of-serve metric to be used. Ultimately, the only thing the feedback controller does is to keep this quantity at its desired value — hence we should make sure that the metric we choose is indeed a good proxy for the behavior that we want to maintain. At the same time, this metric must be available, immediately and at all times. (We cannot build an effective control strategy on some metric that is only available after a significant delay, for instance.) A final consideration is that this metric should not be too noisy, because noise tends to "confuse" the controller. If the relevant metric is naturally noisy, then it usually needs to be smoothed before it can be used as control signal. (For instance, the average response time over the last several requests provides a better signal than just the response time of the most recent request: taking the average has the effect of smoothing out random variations.)
Although we have introduced feedback control here in terms of data center auto-scaling, it has of course a much wider area of applicability: wherever we need to maintain some desired behavior, even in the face of uncertainty and change, feedback control should be considered as an option. It can be more reliable than deterministic approaches and simpler than rule-based solutions. But it requires a novel way of thinking and knowledge of some special techniques to be effective.
This short article can only introduce the basic notions of feedback control. More information is available on my blog and in my book on the topic (Feedback Control for Computer Systems; O'Reilly, 2013).
About the Author
Philipp K. Janert provides consulting services for data analysis and mathematical modeling, drawing on his previous careers as physicist and software engineer. He is the author of the best-selling Data Analysis with Open Source Tools (O'Reilly), as well as Gnuplot in Action: Understanding Data with Graphs (Manning Publications). In his latest book, Feedback Control for Computer Systems, he demonstrates how the same principles that govern cruise control in your car also apply to data center management and other enterprise systems. He has written for the O'Reilly Network, IBM developerWorks, and IEEE Software. He holds a Ph.D. in theoretical physics from the University of Washington. Visit his company's website.
Camille Fournier May 21, 2015