Google on the Technical Debt of Machine Learning

A number of Google researchers and engineers presented their view on the technical debt of using machine learning at the Software Engineering for Machine Learning workshop, which was part of the annual NIPS conference held in Montreal. They identified different aspects of technical debt and came to the conclusion that without proper care, using machine learning or complex data analysis in your company can lead to new kinds of technical debt different from classical software engineering.

In the paper, they identified four different areas where technical debt occurs: erosion of boundaries between subsystems, data dependencies, system-level anti-patterns, and issues that arise from dealing with changes in the real-world.

For example, the study argues that machine learning methods are by design systems that mix together inputs coming from different parts in order to arrive at high accuracy predictions. However, this means that using such methods automatically leads to increased entanglement between otherwise well-isolated modules. The result is that changes in a single module can affect the overall prediction performance significantly.

Just like reuse of libraries leads to code dependencies, machine learning methods lead to data dependencies. One issue here identified by the study is that data sources are often unstable leading to instability in the prediction module.

A common approach to machine learning is to use a large pool of possibly informative data sources and then letting the algorithm pick the relevant ones. Consequently, more data sources are used than strictly necessary. Here, a periodic cleanup can be helpful.

Finally, the study says that tools for tracking, documenting, and resolving data dependencies are also important, similar to such tools for code dependencies.

From a system architecture perspective, using machine learning methods often lead to a number of common software design anti-patterns. Especially when using general purpose machine learning software, a large amount of glue code needs to be written to integrate the code. Here, a clean rewrite can be helpful.

Experimenting with different analysis alternatives often leads to dead codepaths if not cleaned up regularly.

In general, the study advocates a more integrated approach when it comes to researcher and engineering roles. At Google, these roles are usually mixed within the same team, sometimes even in the same person.

Finally, machine learning methods that run in production have to deal with real world data that evolves over time. This situation incurs another kind of technical debt because one has to ensure that prediction performance is stable. Here, monitoring of prediction performance, and other fundamental data statistics to look for changes in the data can be helpful, the paper finds.

Google is well known for relying on machine learning and complex data analysis for a number of their core services. For example, they are using prediction models to optimize ad placement for their service. Services like image search are also driven by machine learning. Google has recently invested heavily in this area by acquiring, for example, the deep learning startups Deepmind and DNNresearch. Other companies follow similar paths. For example, Facebook has set up a machine learning lab in New York.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Machine Learning topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter