BT

How ING Bank Does SRE

| by Manuel Pais Follow 9 Followers on Dec 30, 2017. Estimated reading time: 2 minutes |

Janna Brummel and Robin van Zijll, from ING Netherlands, talked at the Velocity conference in London about how poor availability from their internet banking systems prompted the bank to implement an SRE culture. A centralized SRE team was set up in the Netherlands to provide tooling, consulting and education on reliability to product teams (known as BizDevOps squads internally).

By mid-2017 ING's metrics highlighted that their internet banking retail systems' availability was down to 96.84%, in contrast to other systems (ideal retail and mobile banking retail) closer to the ideal 99.99% mark. Some of the factors leading to this outcome included: lack of monitoring ownership by product teams; a centralized alerting system triggered at very high level (system down) causing a long time to diagnose and delegate to engineers (69 minutes on average for a major incident); infrequent post incident reviews and sharing of lessons learned; and a lack of availability insights at component level (aggregated results at service level only contributed to product teams not feeling directly responsible).

The centralized SRE team has a consulting role only (they do not run and are not on call for the services) but also acts as a platform team, providing tooling and internal services to help the product teams run and improve their systems' reliability. Planning and prioritization of the team's backlog is guided by the service reliability hierarchy as defined in Google's SRE book:

So far, the SRE team has focused mostly on the bottom three layers in the pyramid. In terms of monitoring and incident response, they are building shared tooling, based on Prometheus, Grafana and Mattermost (ChatOps). They facilitate postmortems by the product teams, and provide consulting on how to identify and fix reliability issues. Brummel and van Zijll mentioned how it took time and concerted effort to remove the existing blame culture around major incidents. They advise to invest time creating awareness and setting the scene before actually increasing the frequency of the incident reviews, otherwise they can backfire.

All these changes were rolled out on-demand, not as a "big bang" initiative, allowing product teams to decide whether to switch to the tooling and practices proposed by the SRE team. The latter is also in the process of scaling from one team with a few engineers to a larger community of practice (with multiple SRE teams across different countries - currently three teams in the Netherlands, one in Spain and one in Australia). Demos and internal discussions on SRE topics help build the community.

Brummel and van Zijll's takeaways so far in their SRE journey include: value SRE mindset over specific skills when hiring; SRE team needs a product owner to protect the team from conflicting priorities; be ready to spend a lot of time explaining and promoting SRE to product teams; tooling provided needs to be of commercial quality in terms of usability and it needs to alleviate actual pain points of your users; consider scalability and ownership in your tooling strategy.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT