BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News James Gosling Shares Wisdom Related to IoT at Devoxx: Code on the Edge and Its Hurdles

James Gosling Shares Wisdom Related to IoT at Devoxx: Code on the Edge and Its Hurdles

In his Devoxx talk, James Gosling, the father of Java, zooms in on the technicalities of writing code for devices on the network's edge. Based on his impressive career developing software for devices ranging from satellites to autonomous submarines, he provides practical advice for those moments when the hardware is on the bottom of the sea, or when minor errors could cause mayhem or even fatality.

Starting with his first software development endeavour when he was working on the Isis II satellite, he dealt with software for the real world at gigantic scales with fleets composed of at least millions of devices; the kind of software outside of the normal enterprise application patterns, or as he described it:

As soon as you start touching the world, there are tens of thousands of things that interact with the network and not necessarily through a browser …

The moment when you go out of the safe landscape of the cloud (where usually the operation of the software is the job of a dedicated operations team), you have to start thinking about non-functional requirements, or how to deal with monitoring, application administration, and security. Usually targeting gargantuan fleets of millions of interconnected devices.

Even though in the cloud space, the eight fallacies of distributed were deemed irrelevant, on the edge, they still remain fallacies. In the IoT space, the network is imperfect: latency, bandwidth, and connectivity are all variables, so you need to adapt to live with quicksand and smoke. In an environment, where connectivity is a moving piece, critical messages always get through: for example, in the case of sending engine telemetry data, the "engine is on fire message" will take precedence. Minute-by-minute telemetry data like engine temperature or other parameters might remain on board forever for later analysis.

When developing for the cloud, you can just fail fast, and "the gods above you will handle the rest". The OPS teams make sure the infrastructure runs smoothly. On the edge, that is not possible: you cannot reset a device in the middle of its operation. When the devices are in the middle of nowhere, failure of the system is not an option, and failure of the parts is normal. In this case, you need to build modular systems that allow you to keep the blast radius at a micro-scale, allowing you to isolate faulty parts and keep the whole going. Java was designed for this, GC, try-catch, Java pointers being designed to ensure the impact of faults is as small as possible, hence making constructions considered anti-patterns in the enterprise code, a plain necessity in the world of devices:

try {
   leapOfFaith()
} catch (Throwable t) {
     xyzzy();
}

The code needs to be robust in the face of undiscovered bugs, allowing you to retry its execution multiple times, as you might be dealing with a transient failure. In these ecosystems, even updates can be challenging: imagine a heterogeneous collection of hardware versions, in different states of operation. So, updates need to be negotiated when the situation permits it. When operating at scale, the option of rollback and different phasing-out strategies are important.

In conclusion, the most important rule is that your devices cannot become bricks. In the internet of things, updates, testing, and management all need to be done at scale. So, one has to ensure the application was tested as thoroughly as possible but also that it is still possible to operate with just part of the system still in place.

About the Author

Rate this Article

Adoption
Style

BT