BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News How Etsy Deploys More Than 50 Times a Day

How Etsy Deploys More Than 50 Times a Day

This item in japanese

Lire ce contenu en français

Bookmarks

Daniel Schauenberg described at the last QCon London how Etsy, renowned for its DevOps and Continuous Delivery practices, does 50 deploys/day. A fully automated deployment pipeline, thorough application monitoring and IRC-based collaboration are all important to achieve this rate of change while keeping risk to a minimum.

Etsy's development approach revolves around making many small, continuous changes. A direct consequence is the need to do many deployments a day. In the words of Daniel Schauenberg, at any given time every Etsy developer needs to know the answer to the question "how comfortable am I with deploying a change right now?". To be comfortable at all times, Etsy adopted a range of tools and practices: mandatory IRC-based communication; developer virtual machines; continuous integration; one-click deployments; thorough application and system monitoring; no blame post-mortems and on-call policies for both dev and ops teams.

Every developer has its own KVM (Kernel-based Virtual Machine), configured by Chef. The same cookbooks used in production are also used on the developers' virtual machines, which means that each developer has its own full Etsy stack. Anyone can provision a virtual machine through Virtual Madness, a web application that automates the whole process.

On the continuous integration front, Daniel explained how Try is central to their process. Try is a tool that allows a developer to test his changes in Jenkins, the CI tool used at Etsy, without having to commit to trunk. Try helps to keep the trunk clean and thus deployable, while at the same time allowing the developers to test their changes quickly and reliably. The CI cluster must be powerful enough to support 150 engineers, and more than 14000 tests suites runs per day. LXC, Linux containers, parallelize the workload. They also provide the isolation needed to keep the executors from colliding with one another.

The deployment pipeline passes through the princess, or staging, environment before going into production. Princess is, for all intents and purposes the production environment, but only Etsy's employees have access to it. The Deployinator is the deployment tool made and used by Etsy that offers one-click deployments.

Config flags, also known as feature flags, are an integral part of the deployment process. Through its feature API, Etsy is able to do A/B testing, completely enable or disable a feature or variants of a given feature.

Monitoring is key to the way Etsy's team builds the confidence to do Continuous Delivery. Developers do their own feature monitoring and everyone has access to all the graphs through dashboards. Etsy has a policy where, by default, everything that can be graphed is graphed. Over time, the number of metrics has increased steadily so Etsy has built Kale, to help detect anomaly patterns. All logs are available through Supergrep, a web based log streamer that increases the logs' signal-to-noise ratio.

IRC is the main communication tool throughout Etsy and is key to the collaboration culture of Etsy. There are lots of different chat rooms, each with a specific purpose. For instance, there is a #warroom where only outage related conversations are allowed. The room is used to coordinate the investigation, discuss counter measures and resolution monitoring. #warroom, as with other chat rooms, is one place where new engineers are encouraged to lurk around, as they are considered to be good places to learn.

After each outage, or near outage, everybody is invited to a post-mortem. Post-mortems are such a significant cultural event that even finance and support can attend if they want to. Post-mortems are meant to be a learning opportunity and so they are blameless. All the information related to a post-mortem is recorded in Morgue: dates; severity; IRC logs; graphs; remediation actions. Morgue is another tool built by Etsy for the specific purpose of post-mortem record keeping.

There are on-call policies for operations, developers, payments and support. Developers are usually on-call one week every four weeks, on a rotation basis. The policy aims to keep everyone aware of the day-to-day issues that face the site so that they can be taken into account when developing new features or improving existing processes.

Etsy has about 60 million monthly visits and 1.5 billion page views per month.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Seems excessive

    by Erik Mathis,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Seems like Etsy is a disorganized mess and its trying to use tech to make up for it instead of better planning. Its a Eretailer for pete sake! 50 times a day is so over the top that I cant even understand that mentality. FYI, Jenkins is NOT QA. I feel like 50 deploys a day is to fix missed bugs. I could see a few incremental releases per day.

  • Scalability (downwards, not upwards)

    by Martin Goodwell,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The technology used is pretty impressive to me.
    As I also have a background on working with small teams and small companies: what do you think is the minimum team size required to be able to completely take advantage of such an approach?

  • Other continuous deployment practices at Etsy

    by Akond Rahman,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Wow ! Great talk ... I was wondering if Etsy uses dogfooding, and a packaging tool to deploy the software changes ? What about A/B Testing ?

  • Valuing competing organizations on deployments per day?

    by Frank Cohen,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I felt the same as Erik about the deployments-per-day at Etsy. But then I started to map out what I've been hearing from other modern dev shops: Facebook does 2 per day, Wordpress does 150 per day, and Citibank does 2 per month. In a world that has Agile and Continuous Integration, and DevOps and Configuration Management, does it make sense to value competing companies on the number of deployments they can make per day? -Frank

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT