Force12.io Create a ‘Microscaling’ Framework for Apache Mesos
Force12.io have released a prototype ‘microscaling’ container demonstration running on the Apache Mesos cluster manager, which they claim starts and stops ‘priority 1’ and ‘priority 2’ containers more rapidly than traditional autoscaling approaches when given a simulated demand for the differing workloads.
Efficient auto-scaling of compute resources is a challenge for many companies that have varied usage demand and workload types. For example, Netflix have previously discussed how that in addition to utilising traditional autoscaling approaches, they also created a tool named ‘Scryer’ that leverages machine learning techniques in order to attempt to predict current and future demand, and preemptively adjust the resource available on the underlying compute platform. Netflix has since talked about the creation of a new platform that will allow more reactive scaling, which is built upon Mesos, named ‘Titan’, and also the creation of the ‘Fenzo’ scheduler that can be incorporated into custom Mesos frameworks.
Force12.io have been active in this space for some time, and have recently publicly shared demonstrations of a Mesos-based ‘autoscaling’ scheduler, which they claim is capable of ‘microscaling’, a term coined to indicate more fine-grained and timely scaling than that offered by traditional autoscaling approaches. The Force12.io Mesos scheduler runs on Apache Mesos within Mesosphere’s Marathon framework, and starts and stops ‘priority 1’ (high priority) and ‘priority 2’ (low priority) containers based on simulated (random) demand. This scheduler implementation approach appears similar to that used by Google’s internal ‘Borg’ cluster manager that was publicly discussed by John Wilkes at QCon London 2015.
InfoQ recently sat down with Ross Fairbanks, chief architect at Force12.io, and discussed the work on the company’s Mesos scheduler and the future of ‘microscaling’.
InfoQ: Thanks for taking the time out to talk to InfoQ today Ross. Could you briefly introduce yourself and the motivation for the experiments your have documented in the Force12.io blog post?
Fairbanks: My name’s Ross Fairbanks and I’m Chief Architect at Force12.io, where we're working on real time container scaling. Part of my background is in retail e-commerce and working on sites with spiky traffic patterns. This has led to an interest in auto scaling, but doing this with virtual machines (VMs) is hard. The lead time to adding capacity is usually several minutes. This necessitates workarounds like scaling up quickly and scaling down slowly.
So the fast startup and shutdown times is the property of containers that interests us the most. With containers it’s possible to add or remove capacity in seconds or even sub-seconds. So far we've demonstrated the concept by building demos on AWS’ EC2 Container Service and Apache Mesos. Now we're working on open sourcing our solution so you can run it on your own infrastructure.
InfoQ: We see you are using the term 'microscaling' rather than 'autoscaling'. What is the reasoning behind this?
Fairbanks: We want to differentiate between traditional autoscaling where you are adding or removing capacity but it takes minutes. With microscaling you are using your existing capacity more efficiently and scaling containers in seconds or sub-seconds.
Microscaling works well with microservices architectures where different services have different traffic patterns. A retail example is a search service will be busy when a marketing email is sent out, but in contrast, a fulfilment system will only be busy when the resulting orders are shipped.
We also think both technologies can work together, especially in public cloud environments. Microscaling can be used to quickly respond to spikes in demand while extra VM capacity is added via autoscaling. Netflix are already doing this on AWS with their Titan system and they've open sourced the Fenzo framework for Mesos. With Force12 we want more organisations to be able to reach this level of server utilisation. Force12 will also be platform agnostic and we plan to support all the major container schedulers.
InfoQ: What was the motivation for building upon Mesos and Marathon, in comparison with other alternatives?
Fairbanks: We see Force12 as being a container scheduler that cooperates with other schedulers and specialises in microscaling. Marathon is an advanced scheduler which supports fault tolerance. It also has a good REST API that let us integrate with it easily. For cooperating schedulers to become a reality we think some scheduler standards will be needed and this is something we want to work with the community on.
InfoQ: The Force12.io blog post mentions that you have performed some special configuration/tuning of your Mesos cluster. What was the motivation behind this, and would the changes be applicable outside the experiment?
Fairbanks: Our Mesos cluster is running on CoreOS on EC2. We use Fleet to start Mesos, Marathon and ZooKeeper and we use Consul for service discovery. We've released the setup code and it can be run locally as 3 Vagrant VMs. We've also worked with Packet.net on this and we plan to move our Mesos demo onto their physical servers to test out the limits of microscaling on high performance hardware.
For tuning we configured Marathon to launch tasks in parallel rather than sequentially which is the default setting. We also reduced the default allocation interval from 1 second to 100 milliseconds. The other change that made a big difference was running a local Docker registry on our CoreOS cluster. We've found the choice and location of our Docker Registry to be a key factor on both the Mesos and ECS demos. We've written a blog post about the Mesos demo that has more details on this.
InfoQ: Many cloud vendors are now offering container solutions (ECS, GKE, Triton etc), and so do you think your research will be of interest to them?
Fairbanks: We think so, as microscaling can run on any container cluster and it works with the “Data Centre as Operating System” (DCOS) approach. We think as organisations move to "DCOS" they will start measuring their server utilisation more, and they will want to increase this and get the cost savings that are possible with microscaling.
InfoQ: What do you think the future of microscaling with hold, and how soon will this be a large-scale commercial reality (we appreciate that Amazon have already released AWS Lambda, which is a conceptually similar product)?
Fairbanks: For microscaling to become common we think containers first have to be running in production in many organisations. We believe this will happen over the next 12 to 18 months. Some organisations have chosen to implement containers in production much sooner, often because they need the benefits that micro services architectures can provide, such as autonomous teams. We want to work with these early adopters and get microscaling ready for production use.
InfoQ: Thanks for your time today. Is there anything else you would like to share with the InfoQ audience?
Fairbanks: We’ll be releasing the first version of our open source solution very soon, and if you follow us on Twitter (we’re @force12io) you’ll hear as soon as it’s available. We’re really looking forward to getting some feedback on our ideas from the community.
Additional detail on the work undertaken by the Force12.io team can be found on the Force12.io blog. The Apache Mesos website contains information for developers that are interested in creating their own Mesos framework, and videos discussing a range of Mesos-related topics (from introductory material to advanced framework building) can be found on the MesosCon 2015 conference Youtube channel. The MesosCon EU conference is also running October 8th - 9th in Dublin.