Q&A with Christoph Windheuser on AI Applications in the Industry

Increased hardware power and huge amounts of data are making existing machine learning approaches like pattern recognition, natural language processing, and reinforcement learning possible. Artificial Intelligence is impacting the development process; it’s increasing the complexity of things like version control, CI/CD and testing.

Christoph Windheuser, global head of Artificial intelligence at ThoughtWorks, spoke about AI applications in the industry at Goto Berlin 2018. InfoQ is covering this conference with Q&A, summaries, and articles.

The biggest advantage of machine learning approaches is that the behavior of an algorithm can be optimized by learning from data instead of programming, argued Windheuser. This means that algorithms can achieve behavior which is not possible to program, like optical and acoustic pattern recognition or natural language processing. And with that, completely new applications are possible and the possibilities are endless, he said.

Windheuser stated that in Data Science and Machine Learning projects, developers not only have to take care of their programming code, but also of vast amounts of data, like the training pattern, the features extracted from these patterns and the parameters and hyper-parameters of the learning algorithm. This brings a new dimension of complexity into the development process, he argued.

InfoQ spoke with Windheuser after his talk.

InfoQ: What is possible with AI?

Christoph Windheuser: With a vastly increased hardware power and huge amounts of data available today, old and well-known machine learning approaches can suddenly be applied in a scalable and operational way. All kinds of pattern recognition like speech recognition and image recognition. In the area of natural language processing things like language translation, sentiment analysis, intent recognition, text-to-speech, and chatbots are well known. With reinforcement learning even strategies to play video games, chess or go or driving a car smoothly and safe can be learned. All this would be impossible without machine learning approaches.

InfoQ: What role does data play in applying AI?

Windheuser: Data is the foundation of any machine learning algorithm. For supervised learning like backpropagation, you need a significantly higher number of training patterns than you have parameters which you optimize (weights) to achieve a good generalization of your network. And for deep learning models with a high number of layers and a high number of units per layer, the number of parameters can easily go into millions, which requires an even bigger amount of training patterns for a successful training.

For supervised learning, the training pattern requires labels (for example, the right classification of this pattern), which usually have to be curated by hand. Additionally to that, the data pattern has to be brought into the right form to be digested by the learning algorithm. This means, that the right features have to be extracted from the training data. This is very important to reach a good result of your training algorithm.

For example, if you want to learn the future customer demand for articles in a grocery shop, you might use the history sales data to predict future sales. You could use the POS (point of sales) data directly to train your network. But it is very helpful to extract, for example, the weekday out of the time stamp in the POS data and feed this as an extra feature into the network. Because the customer demand is highly dependent on the weekday, this will help the network to learn and converge easier and faster.

InfoQ: How will AI change software development and deployment processes?

Windheuser: AI is changing the development process in many ways already. To be able to roll back in case something went wrong or training experiments did not show the expected results, the code and all these data have to be set back to a defined and consistent point in the past. We made some good experience with the open source tool DVC (Data Science Versioning Control - dvc.org) which is capable of doing this, even if the data sits in remote cloud buckets.

Also, the setup of the continuous integration / continuous delivery (CI/CD) environment is getting more complex. You usually have several streams and pipelines in parallel, for example, one for the application development and one for the data science and machine learning models. If you run the training in parallel distributed over several machines to speed up the training, you have to use fan-in and fan-out of your pipelines to parallelize the training and synchronize for testing afterwards. In a lot of our projects we are using the GoCD environment, an open source tool developed by ThoughtWorks which is capable of managing such complex CI/CD environments.

The testing of applications with machine learning components also gets more demanding. Besides the unit tests of the different components, you have to test the success of the training phase with some KPIs like the error rate or the confusion matrix achieved on an independent test set. And in a lot of cases, the functional tests cannot be automated completely. Look at a chatbot for example: its functionality cannot only be tested automatically because then only a small part of possible dialogues people would actually use talking to the chatbot would be tested. Because of the high effort for functional testing for machine learning applications, we often speak from a "test table" instead of the classical "test pyramid".

InfoQ: What will the future bring us in AI?

Windheuser: This is hard to predict because this field is changing rapidly and a lot of money is being spent on research and development worldwide. To bring AI and its applications to a new level, new scientific breakthroughs are necessary, for example in the area of unsupervised learning, gaining real-world knowledge or reasoning. This can happen in the next months or years, or it can take decades with another AI winter in between. I truly believe that eventually, Artificial General Intelligence (AGI) with a superhuman intelligence will be possible, but this will still take a long time.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter