Automation with humans in mind: making complex systems predictable, reliable and humane

Brian Troutwine presented about complex and real-time systems at DevOps Days Belgium, covering interactions between humans and machines, with examples of automation done right and wrong.

Real-time systems are not only those that run as fast as possible, but that have a deadline, and are classified as:

soft, where the usefulness of a result degrades after its deadline.
firm, where infrequent deadline misses are tolerable, but may degrade the system's quality of service.
hard, where missing a deadline is a total system failure.

Complex systems typically have:

non-linear feedback.
are coupled to external systems.
are difficult to model and to understand.

Brian explained two different views of human and machine interaction, human cooperating with machine, and humans vs machines, using the Apollo 13 and Chernobyl disasters respectively.

In the space rocket programs the NASA was not sure about how to fit the humans into the system, there were experiments building fully automatic rockets, where the rocket follows the best understanding of the engineers. Astronauts had a different idea because they were experimental pilots, they viewed rockets as a more elaborated plane, so a system was designed that could be controlled by humans and computers, as a supervising model, and a balance was struck. During the Apollo 13 disaster, the astronauts expert knowledge to figure the things out and adapt the system to their needs was what saved them.

Tools aid experts to overcome catastrophic failure:

Automation, done right, relieves tedium.
Automation, done right, reduces errors.
Automation, done right, liberates.

In the second example, the Chernobyl disaster, the system was designed in a way that in case of failure it would feed back into itself towards catastrophe. During a test of a backup system, the reactor was driven into a failure-prone state, and due to design and poor management, warning signs were ignored and human input was not desired. Furthermore vital equipment was not available because it was locked in a safe. In this environment, where humans were not trusted, the reactor failed according to its nature. The machine was in control and as a result many people died, and an entire region of Ukraine is abandoned.

Automation, done wrong, mechanizes humans. The automation does nothing to inform humans.
Automation, done wrong, misdirects. They do not get the right information
Automation, done wrong, entraps.

Every system carries the potential for its own destruction. There is no way of getting away from "normal accidents", failure is inevitable. The design of every system must have failure into account. Otherwise system failure happens in completely arbitrary ways.

When designing complex systems, recognize the failures of humans, extend them with automation. Never do it alone ir order to avoid the implicit bias, Bring other people in to understand other views.

Have resources you're willing to sacrifice.
Accept failure, learn from it.
Study the accidents of others.
Some things are not worth building because the cost of the failure is too high.
Understand what you build.

The talk was part of the DevOps Days second day talks. A summary of the first day talks is also available at InfoQ.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter