BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Engineering for the Long Term at Google

Engineering for the Long Term at Google

This item in japanese

Bookmarks

Astrid Atkinson, director at Google, drew on their experiences over the last decade to present some rules and advice on engineering for the long term. The Velocity Conference 2015 attendees at Santa Clara learned that it's crucial to imagine that you're going to be wildly successful, that complexity mustn't be eliminated but managed and that the focus should be on scaling systems not teams.

If you're going to be successful, then complexity will keep increasing no matter what, so the focus should be on managing that complexity. Atkinson says that the growth that your product can drive should never be curtailed: don't stop from doing something because it's too complex. On the other hand, you should keep a close eye on what's costing you time and work towards removing those inefficiencies.

Atkinson highlighted how good teams and good people are invaluable, but very hard to find. If you're going to be successful in the long run, your systems will grow exponentially, but your teams will grow sub-linearly. You'll make large investments - recruiting, training - in your teams, so you'll want to keep them. Atkinson believes that "you team IS your service". Don't burn your people out. Manage the interrupt load. Bring down the silos: systems need both dev and ops.

Enabling growth requires moving away from bespoke solutions to as much standardization as possible. Store configurations in the same way everywhere, standardize on naming, stop managing individual machines. Atkinson mentioned several examples at Google. For many years now, every Google server contains a small http server that serves a status page. This status page provides basic info, such as the traffic that's hitting the server or when was it built. Borg, Google's cluster manager, helped Google's engineers to manage and run their own services by abstracting away the individual machines.

Engineering for the long term also means paying close attention to maintainability. If the development teams are going to grow a lot, it's important to consider how to operate and maintain all those systems. Two rules stand out. First, each service must look after itself and its communications with others. Second, use shared infrastructure. If you have several services that do similar things, consolidate them, to avoid wasteful overhead. When consolidating, Atkinson likes Dan McKinley's advice on boring infrastructure choices: prefer the most stable thing. You'll also have to think on migrating the workload, as it can be a risky and long endeavor. Atkinson favors moving the biggest customer first, to bring the risk forward.

Atkinson also urged everyone to "don't let the weeds get higher than the garden". That means investing in support tools for the engineering processes, e.g., build, testing, release and monitoring ("from the user's perspective"). It also means that tasks that are executed more than twice should be automated.

Astrid Atkinson's talk is available for online viewing.

Rate this Article

Adoption
Style

BT