Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News The Technology Stack at Medium

The Technology Stack at Medium

Medium is an online publishing platform developed by Twitter co-founder Evan Williams. Launched in 2012, it now has over 60 million unique monthly visitors. The technology stack behind the site includes deployment to AWS, applications and services written in NodeJS and Go, data storage with DynamoDB, and Amazon Redshift as their data warehouse.

Dan Pupius, former head of engineering, has published a blog which outlines most of these choices. With several millennia of reading haven taken place on the site, it's these choices that help it operate at scale.

Currently, their environment is deployed to AWS VPC (Virtual private cloud), and their underlying infrastructure is configured using Ansible. The applications themselves run on EC2.

They also have a service-orientated architecture, overall coming to around a dozen production services. Whether functionality forms a new service or gets integrated into an existing one, depends on different characteristics such as decoupling and cohesion.

The main language of choice is NodeJS, one major advantage of which has been code sharing between the server and the client. Performance problems have been hit when blocking the single-threaded event loop, but this has been worked around by running multiple instances - traffic for expensive endpoints is routed to specific instances, thus preventing other requests to the services from hanging.

Pupius also states that their remaining auxiliary services are written in Go, due to ease of packing, building and deployment. He cites a like for type safety, and the lack of verbosity in the language as reasons for using it:

Personally, I’m a fan of using opinionated languages in a team environment; it improves consistency, reduces ambiguity, and ultimately gives you less rope to hang yourself.

Although DynamboDB is the main data store, Pupius mentions that there have been problems along the way, the main one being the hotkey issue. This is when despite being a distributed database, data which on a single node ends up being queried the most heavily, thus losing any performant benefits that would come from partitioning the data. In the case of Medium, this has been mitigated by using Redis as a cache, heavily reducing the amount of queries which actually reach the database. Medium is now experimenting with Amazon Aurora for newer data.

For its relational data, such as relationships between peoples, posts and tags, Neo4J is currently in use. There is currently one master and two slave instances. And for their data warehouse, Medium uses Amazon Redshift with Apache Spark on top for querying.

Using Jenkins, Pupius states testing, building and deploying to their staging environment is done within 15 minutes. Production deployments can happen around five times a day:

We embrace continuous integration and delivery, pushing on green as fast as possible.

The team at Medium is cross-functional, meaning that any engineer is able to work on a part of the stack. Pupius believes this has lead to a strong engineers.

The full blog can be read online, where the stack is even wider than covered in the article. Medium is also available to use for free.

Rate this Article