Amazon Web Services (AWS) have released an open source proof of concept scheduler driver that demonstrated how the Apache Mesos cluster manager could be integrated with the Amazon EC2 Container Service (ECS) preview.
An Apache Mesos framework driver called ECSSchedulerDriver has been released via the AWS Labs Github account, and this driver allows the Mesos cluster management “launch task” commands to be sent directly to ECS. The AWS Compute blog states that this driver demonstrates the potential for Amazon ECS to be integrated with the Mesos ecosystem, which would allow use of Mesos frameworks such as Chronos, Apache Spark and Mesosphere’s Marathon to launch tasks on Amazon ECS.
The AWS Compute blog states that this driver has been released to encourage collaboration between the Amazon ECS and Mesos communities, and the code is not intended for production use.
This is an example of what can be done with Amazon ECS, and is not recommended for production use. We are working with the Mesos community to develop a more robust integration between Apache Mesos and Amazon ECS.
The Amazon ECS preview is a scalable container management service that supports Docker containers and allows distributed applications to be run on a managed cluster of Amazon EC2 instances. Amazon ECS provides an API to enable a developer to launch and stop container-enabled applications, query the cluster state from a centralized service, and provides access to several Amazon EC2 features such as security groups, Elastic Block Store (EBS) volumes and Identity and Access Management (IAM) roles.
The AWS Compute blog states that cluster management is becoming an important task in today’s market, as developers and businesses increasingly develop and deploy distributed applications in the cloud. Cluster management systems schedule work and manage the state of each cluster resource. Examples of modern cluster manager systems include Apache Mesos, Google’s Kubernetes and Apache Hadoop YARN.
Common examples of a developer interacting with a cluster management system is when running MapReduce jobs via Apache Hadoop or Apache Spark, or when scheduling long-running services via Marathon or Apache Aurora. These frameworks typically manage a coordinated cluster of machines working together to perform a large task. In the case of Hadoop or Spark, these tasks are typically data analysis jobs or machine learning using a large data set. In the case of Marathon or Aurora, these tasks are most often application services running within a microservice-style platform.
The AWS Compute blog states that modern cluster management systems have two fundamental challenges. First, there is a large overhead from managing the state of the cluster. For example, a framework may require the orchestration of a centralised leader or coordination component, and must detect failures, replace machines, and restart the follower components that receive and act on commands.
The second challenge is that each of these systems typically assume full ownership of the machine where their tasks are running. Often multiple individual clusters of machines must be deployed, each dedicated fully to the management system in use. This can lead to inefficient distribution of resources, and jobs taking longer to run than if a shared pool of resources could be used.
The AWS Compute blog states that ECS provides a solution to cluster state management. The management of followers (using the ECS Agent), dispatching of subtasks to the proper location, and state inspection of the cluster are all exposed through an API. Rather than a framework having to manage a set of machines directly, ECS manages the instances.
The Amazon ECS product overview web page states that one of the core principles behind the design of ECS is the separation of the scheduling logic from the state management. This allows the use of ECS schedulers provided by Amazon, bespoke schedulers to be written by a developer, or third party schedulers to be integrated with the service. The Apache Mesos cluster manager was chosen as a first proof of concept as it provides frameworks that support both batch jobs and long-running tasks.
This demonstrates how we can quickly extend ECS based on customer feedback, in some cases, to co-exist and collaborate with existing open source tools such as Mesos and Marathon. You can also write your own schedulers for ECS if you have specific needs.
The proof of concept ECSSchedulerDriver Mesos framework driver can be found in the AWS Labs Github repository, alongside instructions for running a task using Marathon. The Amazon Compute blog requests that feedback be given via the ECS forum.