BT

Technical Architecture in Banking and Gaming

| Posted by Ben Evans Follow 28 Followers on Sep 24, 2014. Estimated reading time: 9 minutes |

In this article, I'll start by considering some aspects of the enterprise architecture of financial systems and compare them to some characteristics of gaming environments that I've observed as a player.

In the second half, I'll go on to discuss some of the technology and best practices that have grown up in the development of cloud deployed architectures. Finally, from these case studies, I'll look into my crystal ball and imagine some of the gaming possibilities that the synthesis of all of these techniques could unlock.

First, a caveat about financial institutions. The larger firms are huge and can comprise dozens of business lines and a truly bewildering array of systems, of many different types and widely varying non-functional characteristics. So some of the manner in which I'm going to describe them is, of necessity, an oversimplification. It's possible for an engineer to spend his or her entire career in an investment bank and still never work on more than a fraction of the systems the bank has.

So when I discuss financial institutions and their systems in this article I am really focusing on the client-facing parts of investment banks. These groups and projects are usually very concerned with the reliability and stability of their systems. This is very much in their nature - the banks are heavily regulated players in a highly competitive and lucrative market.

A typical example of such a system might be a client order management system. This would accept orders for stocks and shares (or commodities or foreign currencies or other financial instruments) on behalf of clients and place these orders directly on an electronic market, with basically no manual intervention by bank staff under normal conditions.

Bank clients who use such a system usually have no loyalty to a particular bank. Many clients will have comparable client accounts with several different banks to provide market access.

If the order management system is ever unavailable for any length of time (e.g. for a few seconds or even less) then the customers will simply switch to a competitor bank to fill their orders and they may not return to the original provider for months. This is true even at the busiest periods of the market.

This means that these types of banking systems must be engineered for very high reliability, as their customer base is extremely fickle and a single lost customer can seriously impact the profit a division of the bank can make.

Experience in the market has enabled banks to achieve these levels of reliability, but it has come at a significant cost. This cost is both in terms of software and hardware to ensure redundancy and monitor the system, and also in terms of the number of support engineers required to keep the systems running at the required levels of reliability.

By contrast, when gamers have formed a major emotional attachment to a particular game they can be much more tolerant of outages. For popular games which deploy regular, largeish patches (often a few hundred MB in size) potentially slow download times seem to be mostly accepted by users - and no mass exodus to another game occurs.

Even the occasional crash of a server seems to be regarded as a fact of life. As long as it doesn't happen too often gamers seem to regard crashes and even the loss of a small amount of game state and experience as acceptable.

Another clear difference between banking and gaming systems arises from the difference in user impact patterns. No matter how hardcore the gamer their overall impact on a system, and consumption of system resources, will be limited.

In banking certain clients are much more important than others - and frequently the important "whale" clients will have the capability to consume significant amounts of a system's capability and processing power.

This leads to a situation where the sharding pattern naturally works for games because the individual gamers can be efficiently divided into roughly equal piles. In banking this can be far less applicable or require significantly more work to implement in a useful manner.

One last comparison between banking and gaming tech - one area where both have seen significant work in optimisation is around networking stacks. Latency and bandwidth in particular are issues that are potentially very relevant for both gaming and banking.

Since leaving finance I've become involved with some interesting cloud-based startup projects and seen first-hand some fascinating technologies and practices emerging - some of which seem relevant to the ways in which gaming infrastructure could potentially evolve.

The architectures that we want to build in the cloud should have three main non-functional aspects along with basic fitness for purpose and actually performing the tasks required of them:

  • Redundancy - the architecture should be able to withstand the loss of any individual server. In more advanced use cases, the loss of an entire data centre (or even a whole IAAS region) should not cause service degradation.
  • Recoverability - the system should automatically recover to a good state when a transient outage is over.
  • Reproducibility - the system should have sufficient logging and monitoring that after an outage has occurred, the problem can be reproduced, analysed for a root cause and then fixed so it can't recur.

With these capabilities in mind, I sometimes find it helpful to regard the evolution of cloud technology and best practices to date as being composed of two distinct, but overlapping, phases.

The first is the transition from managed hosting to Infrastructure-as-a-Service (IAAS), characterized by the development of services which offer APIs for provisioning and command and control. Without the presence of such interfaces, it's a real stretch to consider a solution "Cloud" in any meaningful sense.

In addition to the availability of provisioning APIs, the other technology that I regard as being typical of the first phase is the capability to relocate virtual instances to different physical hardware in a manner transparent to the user of the virtual instance.

The combination of these two capabilities - provisioning API and transparent relocation - starts to provide some of the potential benefits that the cloud offers. These benefits are usually stated in terms of elasticity of scaling, compute as a commodity to be purchased by the hour and potentially greater reliability.

The second phase is perhaps best characterized by the phrase "Servers are livestock, not pets". Traditionally, systems administrators hand-built servers to order. In such an environment, it is very difficult (even with scripts and hand-rolled automation) to ensure that two servers are built in precisely the same way.

Worse, even if the servers have been built identically, the problem of verifying this fact remains. Over time this problem only gets worse, as servers are individually upgraded and personally cared for by the sysadmins. If an important server starts to have serious problems it is often nursed back to health like a beloved family pet.

The second age of Cloud really started with the rise of techniques like Continuous Deployment and the Devops movement. Technologies such as Puppet and Chef allow the automated building of uniform servers from scratch, in a way which emphasizes rebuild and redeploy over extensive manual patching. This is the basis of the approach which tends not to value an individual instance very highly, treating them effectively as livestock.

Interestingly, the financial industry had long had a need to deploy large numbers of servers and to be insouciant in the event of any particular server dying. Morgan Stanley are one of the very few investment banks to speak relatively openly about aspects of their infrastructure. They are on record as having tens of thousands of Unix servers across over 30 locations as early as 1995 (Gittler, Moore and Rambhaskar, LISA 95) - and this was to grow to hundreds of thousands of machines over time.

However, despite the existence of capable infrastructure technology almost 20 years ago, these techniques did not become widespread until relatively recently for two reasons:

  1. The technology was purely proprietary and in many cases, rather tightly bound to a specific company's problem domain.
  2. Few companies really had a need to manage and orchestrate that much infrastructure.

The proprietary technologies that highly capable banks developed did provide a foreshadowing of modern large-scale techniques, and so it is no surprise that when companies such as Google began to appear, they used those banks as a primary source of talent.

The development of open-source configuration and management solutions such as Chef and Puppet were to prove key to this second phase of cloud techniques, arriving as they did at a time when more and more companies were discovering the potential opportunities that cheap large-scale compute offers.

Looking into the future, containerization is one obvious next step which is starting to emerge. The idea is to ship a self-contained application deployment unit which prevents Dependency Hell and which is fully functional when deployed onto a basic application host.

The first viable product which enables this is Docker, which makes use of Linux Containers (LXC) to provide isolated application environments running on a union mount filesystem. Docker has ambitious aims, but is still really quite immature and should not be deployed in production by teams who can't cope with some rough edges. However, Docker has a decent sized (and growing) community and support from several major vendors, including Red Hat and Google.

Docker may yet succeed as the dominant technology in this space - or additional credible equivalent competitive products may appear (which would be similar to what happened with Devops and configuration management as additional tooling choices began to emerge).

However, whatever the competitive landscape ends up looking like, the idea of containerization as a deployment method is compelling. For teams which have adopted it, there are clear benefits in terms of thinking about architecture and application packaging.

Finally, let's turn to the question of to what extent the deployment of cloud techniques can point the way to more efficient and reliable gaming infrastructure.

There are two main benefits that could derive from better game infrastructure: Sharply reduced game running costs for game developers, and more reliable infrastructure and less sharding.

The economics of the cloud work in favour of game producers - because they alleviate upfront costs - there's no need to build datacentres which might sit idle for a long time if a game fails to immediately take off.

The benefit of this to players is immense. If a major cost component of games (potentially as much as 10% of the operational costs of running a game) can be reduced and made much more scalable, then this opens up the market for more indie games, more appetite for risk in the AAA space and hopefully a wider range of gaming experiences.

The reliability techniques that have long been a part of banking architecture can also play a role here, in preventing downtime and reducing the impact of sharding on the overall gaming experience.

About the Author

Ben Evans is the CEO of jClarity, a Java/JVM performance analysis startup. In his spare time he is one of the leaders of the London Java Community and holds a seat on the Java Community Process Executive Committee. His previous projects include performance testing the Google IPO, financial trading systems, writing award-winning websites for some of the biggest films of the 90s, and others.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Very interesting article! by Stephane Wantiez

This kind of cloud systems were also massively used by telecom providers, which should handle many thousands of calls per second without any interruption, as it would cost them plenty of money at the end (as the user would interrupt his call earlier, or not be able to make it). I've worked for a big French telecom systems vendor, and this kind of systems, along with the could of services available for telco operators, were done specifically for telecom, inside proprietary platforms. But in the recent years, they've moved massively to more open cloud systems, and that's a very good thing!

In the game industry, it will be more and more important to reduce the cost of such infrastructure by having specialized actors providing farms of servers to game developers and publishers, which would adapt their consumption of servers according to the stream of players currently busy with their games. Nowadays, you have a lot of online games suffering of the lack of investment of their publisher in decent online infrastructure, as they underestimated the quantity of people playing at their game, especially in peak periods...

This kind of periods will have to be flattened by these new games cloud actors by being more global, and providing services to any part of the world transparently, as the peak periods are not the same in different continents. Because you have to scale your infrastructure to handle these peaks efficiently, and you can't have your system idling for the rest of the day!

Moreover, we'll see more and more a new kind of service for gamers : cloud gaming. Companies like OnLive and Gaikai (from Sony) have servers handling the computing of game content before sending the video stream to player, who doesn't have to buy an expensive computer in order to play to the latest games. It will require high reliability and a very low latency to have a game that can be played as if it was run locally. Sony wants to use Gaika in order to provide backward compatibility for old PS games on his new PS4 console without needing to have dedicated chips for that (as PS3 had a very different system based on IBM Cell processors).

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT