Randy Shoup shared his experiences to the QCon London audience in scaling services at Google and eBay, giving advice on building and operating services. A successful services strategy requires end-to-end service ownership, decentralized decision-making and standardization efforts focused on protocols of communications and supporting infrastructure.
Shoup finds that most system architectures evolve from monoliths to many cooperating services. Twitter, eBay and Amazon have treaded this path in the past. Service's evolution is not determined by centralized, top-down design. It instead follows organic natural variation and selection: services get extracted from other services to solve a more general problem. Shoup mentioned Gmail, BigTable and Megastore as examples of this process. If services aren't used, they should be deprecated. Even so, the service's underlying technology can still live on. Google Apps reused parts of Google Wave such as the live editing by multiple users.
Google has no architect title/role. There's no central approval for technological decisions: local teams make their own choices. Even so, this decentralized process leads to nice compositions. Shoup gave the example at Google, where Cloud DataStore, a NoSQL offering by Google is built on top of Megastore. Megastore is in turn built-on top of BigTable, Colossus (next-gen Google File System) and Borg, Google's cluster management system. Google built all these systems in a decentralized way, yet they all ended up being highly composable.
eBay had a few years ago an Architecture Review Board, a centralized approval body for large projects. But Shoup says that they were usually involved to late to change things, reinforcing the decentralization's case. Shoup believes that a better way to benefit from experienced engineers is to get them to encode their knowledge in a reusable service or tool, so others can learn from them.
Standardization efforts
Within an ecosystem of hundreds or thousands of services, some standardization is important. As mentioned several times at QCon, standardization efforts should concentrate on communication and infrastructure. Communication standards touch such things as network protocols and data formats. Infrastructure standards are all about source control, configuration management, cluster management, monitoring and alerting.
In a decentralized environment, standards aren't centrally mandated. Standards become standards by being better than the alternatives, rather than by fiat. The best way to promote standards is to encourage people to adopt them, by making it easy to do the right thing. It is possible to help the adoption process by creating libraries, ensure code is searchable or by doing code reviews.
Building and operating services
An effective service should be single-purposed and simple. This requires the service to have a well-defined interface and isolated persistence. Adding to that, a service owner should have end-to-end ownership, from design to retirement. These requirements allow the service owner to focus on its bounded context, its clients and on the services it depends. Service owners are usually teams of 3 to 5 people, making them agile and nimble.
The relationship between services and its consumers should be a vendor-customer relationship. It should be cooperative, but structured, with clear ownership and division of responsibility. Importantly, the customer can choose to use service or not. Maintaining service stability is crucial to keep trust. A service should never break its clients' code, even if that means many interface versions and deployments. A service should have explicit deprecation policies to help its users plan ahead of time.
On the subject of operating services, Shoup emphasized the need to provide predictable performance. Again, this was a recurring them at QCon. Predictable performance is much better than average performance, as it makes consuming the service easier.
A variety of techniques allow for easier incremental deployments. Shoup suggests the use of canary systems, staged rollouts and rapid rollbacks. At eBay it was common to use feature flags. This is an effective approach that has been "rediscovered many times" over the years, according to Shoup.
Shoup believes, as many others at QCon, that there's no such thing as too much monitoring, though the same cannot be said about alerting.