Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Will AI Surpass Human Intelligence? Interview with Prof. Jürgen Schmidhuber on Deep Learning

Will AI Surpass Human Intelligence? Interview with Prof. Jürgen Schmidhuber on Deep Learning

Photo of Juergen Schmidhuber
Photo Credits: Wort & Bild Verlag /
Eleana Hegerich

Machine learning has become a buzzword in the media these days. Recently Science magazine published a cover paper on Human-level concept learning through probabilistic program induction and shortly after Nature magazine devoted its cover story to AlphaGo, an AI program that defeated European Go Championship winner.

Late on Tuesday night, Google's DeepMind AI group will play one of the world's best human Go players, Lee Se-dol of South Korea. The game will be live streamed on YouTube, and the stream is embedded at the end of this story.

Many are now discussing the potential of artificial intelligence, asking questions such as "Can machines learn like a human?", "Will artificial intelligence surpass human intelligence?", and so on. To answer such questions, InfoQ interviewed Prof. Jürgen Schmidhuber, Scientific Director of The Swiss AI Lab IDSIA. He will tell you more about deep learning as well as the latest trends and development in artificial intelligence.

InfoQ: What is Deep Learning and its history?

Schmidhuber: It is a new branding of an old hat. It is mostly about deep neural networks with many subsequent processing stages, not just a few. With today’s faster computers, such nets have revolutionized Pattern Recognition and Machine Learning. The term "Deep Learning" itself was first introduced to Machine Learning by Dechter in 1986, and to Artificial Neural Networks (NNs) by Aizenberg et al in 2000.

The father of Deep Learning is the Ukrainian mathematician Ivakhnenko. In 1965 (with Lapa) he published the first general, working learning algorithm for supervised deep feedforward multilayer perceptions. In 1971, he already described a network with 8 layers, deep even by present standards, trained by a method that was still popular in the new millennium. He was far ahead of his time - back then, computers were a billion times slower than today. More history in my survey with 888 references and the “Deep Learning” entry at Scholarpedia.

 InfoQ: What is your opinion about the Science paper on Human-level concept learning, which achieved 'one-shot learning' through the Bayesian program learning (BPL) framework?

Schmidhuber: The paper is interesting. However, one can achieve fast one shot learning also through standard transfer learning, by first “slowly" training a deep neural net on many different visual training sets, such that the first 10 layers become a pretty general vision preprocessor, then freeze those 10 layers, and retrain only the 11th top layer with high learning rate on new images. This has worked well for years.

InfoQ: How would you compare Bayesian methods with deep learning methods? Which is more feasible and why?

Schmidhuber: The ultimate optimal Bayesian approach to machine learning is embodied by the AIXI model (2002) of my former postdoc (now professor) Marcus Hutter. Any computational problem can be phrased as the maximization of a reward function. AIXI is based on Solomonoff's universal mixture M of all computable probability distributions. If the probabilities of the world's responses to some reinforcement learning agent's actions are computable (there is no physical evidence against that), then the agent may predict its future sensory inputs and rewards using M instead of the true but unknown distribution. The agent can indeed act optimally by choosing those action sequences that maximize M-predicted reward. This may be dubbed the unbeatable, ultimate statistical approach to AI - it demonstrates the mathematical limits of what's possible. However, AIXI’s notion of optimality ignores computation time, which is the reason why we are still in business with less universal but more practically feasible approaches such as deep learning based on more limited local search techniques such as gradient descent.  

InfoQ: The Science paper describes the result as “passing the visual Turing test”. Is the Turing test, which was devised more than half a century ago, still valid today?

Schmidhuber: Does my chat partner seem human to me? Then it has passed my personal Turing Test. The main problem with this test is that it is so subjective, as illustrated by Weizenbaum many decades ago. Some get fooled easier than others.

 InfoQ: What do you think of Google DeepMind’s Nature paper on AlphaGo, a program that beat a professional Go player? Is AlphaGo a big breakthrough in this area? What helps AlphaGo to achieve this?

Schmidhuber: I am happy about Google DeepMind’s success, also because the company is heavily influenced by my former students: two of DeepMind's first four members and their first PhDs in AI came from IDSIA, one of them co-founder, one of them first employee; other ex-PhD students of mine joined DeepMind later, including a co-author of our paper on Atari-Go in 2010.

Go is a board game where the Markov assumption holds: in principle, the current input (the board state) conveys all the information needed to determine an optimal next move (no need to consider the history of previous states). That is, the game can be tackled by traditional reinforcement learning (RL), a bit like 2 decades ago, when Tesauro at IBM used RL to learn from scratch a backgammon player comparable to the human world champion (1994). Today, however, we are greatly profiting from the fact that computers are at least 10,000 times faster per dollar. In the last few years, automatic Go players have greatly improved. To learn a good Go player, DeepMind’s system combines several traditional methods such as supervised learning (from human experts) and RL based on Monte Carlo Tree Search. It will be interesting to see the system play against the best human Go player in the near future.

Unfortunately, however, the Markov condition does not hold in realistic real world scenarios. That’s why real-world games such as football are much harder than chess or Go, and Artificial General Intelligence (AGI) for RL robots living in partially observable environments will need more sophisticated learning algorithms, e.g., RL for recurrent neural networks.

For a comprehensive history of deep RL, have a look at Section 6 of my survey.

InfoQ: Recently, Google DeepMind has announced to enter the healthcare market. What do you think of that?

Schmidhuber: We are very interested in healthcare applications of deep learning. In fact, in 2012, our team at IDSIA (first author Dan Ciresan) had the first Deep Learner to win a medical imaging contest. And I am glad to see that many companies are now also using deep learning for medical imaging and similar applications. The world spends over 10% of GDP on healthcare (over 7 trillion USD per year), much of it on medical diagnosis through expensive experts. Partial automation of this could not only save billions of dollars, but also make expert diagnostics accessible to many who currently cannot afford it. In this context, the most valuable asset of hospitals may be their data – that’s why IBM spent a billion on a company that collected such data.

InfoQ: What do you think about IBM's new Watson Internet of Things Platform? What is the potential of AI in the field of Internet of Things? Will "AI as a service" be a promising trend for AI?

Schmidhuber: The Internet of Things (IoT) will be much larger than the Internet of Humans (IoH), because there are many more machines than humans. And many machines will indeed provide "AI as a service" to other machines. Advertisements make IoH profitable; however, the business model for IoT seems less obvious.

InfoQ: Some say the future is about unsupervised learning – would you agree?

Schmidhuber: I’d say even the past was about unsupervised learning, which is about detecting regularities in the observations without a teacher, which is essentially about adaptive data compression, e.g., through predictive coding. I published my first paper on this a quarter century ago - this actually led in 1991 to the first working "very deep learner" that could deal with hundreds of subsequent computational stages.

InfoQ: Can machines learn like a human?

Schmidhuber: Not yet, but perhaps soon. See also this report on “learning to think:” Unsupervised data compression (as in the previous question) is a central ingredient of RNN-based adaptive agents that exploit RNN-based predictive world models to better plan and achieve goals. We first published on that line of research in 1990, and have made a lot of progress since then.

InfoQ: Is there a limit of artificial intelligence?

Schmidhuber: The limits are essentially the limits of computability identified 85 years ago by Kurt Gödel, the founder of theoretical computer science (1931). Gödel showed that traditional math is either flawed in a certain algorithmic sense or contains true statements that cannot be proven through computational procedures, neither by humans nor by AIs.

InfoQ:  In your eyes, what is the ideal division of work between humans and computers?

Schmidhuber: Humans should do zero percent of the hard and boring work, computers the rest.

InfoQ: You are well known for the seminal work on Recurrent Neural Networks (RNNs), in particular, Long Short-Term Memory (LSTM), which has been widely used in deep learning today. Can you give us a short background and technical description of LSTM? What areas do you think LSTM is most suited to? Are there any real-world examples? 

Schmidhuber: Supervised LSTM RNNs are general-purpose computers that can learn parallel-sequential programs dealing with all kinds of sequences such as video and speech. They have been developed since the early 1990s in my lab through outstanding PhD students and postdocs including Sepp Hochreiter, Felix Gers, Alex Graves, Santi Fernandez, Faustino Gomez, Daan Wierstra, Justin Bayer, and others. Parts of LSTM RNNs are designed such that backpropagated errors can neither vanish nor explode, but flow backwards in "civilized" fashion for thousands or even more steps. Thus LSTM variants could learn previously unlearnable “Very Deep Learning” tasks that require to discover the importance of (and memorize) events that happened thousands of discrete time steps ago, while previous standard RNNs already failed in case of minimal time lags of 10 steps. It is even possible to evolve good problem-specific LSTM-like topologies.

Around 2007, LSTM trained by our CTC (2006) started to revolutionize speech recognition, outperforming traditional methods in keyword spotting tasks (2007). At Google, LSTM later also helped to improve the state of the art in image captioning (2014), machine translation (2014), text-to-speech synthesis (2015, now available for Google Android), syntactic parsing for natural language processing (2015), and many other applications. In 2015, CTC-trained LSTM dramatically improved Google Voice (by 49%), which is now available to over a billion smartphone users. Microsoft and IBM and other famous companies are also heavily using LSTM.

InfoQ: Your team won nine international pattern recognition competitions, such as handwriting recognition and traffic sign recognition, to name just a few. How did you achieve this?

Schmidhuber: My team is indeed proud to have won multiple contests including:

  • MICCAI 2013 Grand Challenge on Mitosis Detection
  • ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images
  • ISBI 2012 Brain Image Segmentation Challenge
  • IJCNN 2011 Traffic Sign Recognition Competition
  • ICDAR 2011 offline Chinese Handwriting Competition
  • Online German Traffic Sign Recognition Contest
  • ICDAR 2009 Arabic Connected Handwriting Competition
  • ICDAR 2009 Handwritten Farsi/Arabic Character Recognition Competition
  • ICDAR 2009 French Connected Handwriting Competition

How did the team achieve this? Through creativity, persistence, hard work and dedication.

InfoQ: You also laid special importance on very deep nets, didn’t you?

Schmidhuber: Since depth implies computational power and efficiency, we have focused on very deep neural nets from the start. For example, by the early 1990s, others were still limited to rather shallow nets with fewer than 10 subsequent computational stages, while our methods already enabled over 1,000 such stages. I'd say we were the ones who made neural nets really deep, especially recurrent networks, the deepest and most powerful nets of them all. Back then, few researchers were interested in this, but we kept going, and with cheaper and cheaper computing power, it was just a matter of time before contests would be won through such methods. I am glad to see that the other deep learning labs and companies are now also heavily using our algorithms.

InfoQ: The contests above were about pattern recognition – what method do you recommend for the more general field of reinforcement learning and sequential decision making without a teacher?

Schmidhuber: We like our Compressed Network Search, which goes beyond mere pattern recognition, and discovered complex neural controllers with a million weights and (in 2012) became the first method to learn control policies directly from high-dimensional sensory input using reinforcement learning. To go even beyond this, check out the above-mentioned report on “learning to think”.

InfoQ: What are your latest research interests regarding deep learning or artificial intelligence?

Schmidhuber: My latest research interests are still the ones I formulated in the early 1980s: "build an AI smarter than myself such that I can retire.” This requires more than plain deep learning. It requires self-referential general purpose learning algorithms that improve not only some system’s performance in a given domain, but also the way they learn, and the way they learn the way they learn, etc., limited only by the fundamental limits of computability. I have been working on this all-encompassing stuff since my 1987 diploma thesis on this topic, but now I can see how it is starting to become a practical reality.


InfoQ: NNAISENSE has received attention since its launch last year as a deep learning startup. As the president of the company, can you tell us more about NNAISENSE? What is your plan with this new venture?

Schmidhuber: NNAISENSE is pronounced like “nascence,” because it’s about the birth of a general purpose Neural Network-based Artificial Intelligence (NNAI). 5 co-founders, several employees, very strong research team, revenues through ongoing state-of-the-art applications in industry and finance (and also talking to investors). We believe we can pull off the big practical breakthrough that will change everything, in line with my old motto since the 1980s: "build an AI smarter than myself such that I can retire.”

InfoQ: How would you envision the development of AI industry in the near future? What are areas that you believe new killer apps would pop up? Will there be a bottleneck?

Schmidhuber: At an AMA at reddit I pointed out that even (minor extensions of) existing machine learning and neural network algorithms will achieve many important superhuman feats in numerous fields ranging from medical diagnostics to smarter smartphones that will understand you better and solve more of your problems and make you more addicted to them. I guess we are witnessing the ignition phase of the field’s explosion. But how to predict turbulent details of an explosion from within?  Assuming that computational power will keep getting cheaper by a factor of 100 per decade per Euro, in 2036 computers will be more than 10,000 times faster than today, at the same price. This sounds more or less like a human brain power in a small portable device. Or the human brain power of a city in a larger computer. Given such raw computational power, I expect huge (by today’s standards) recurrent neural networks (RNNs) on dedicated hardware to simultaneously perceive and analyze an immense number of multimodal data streams (speech, texts, video, many other modalities) from many sources, learning to correlate all those inputs and use the extracted information to achieve a myriad of commercial and non-commercial goals. Those RNNs will continually and quickly learn new skills on top of those they already know. This should have innumerable applications, although I am not even sure whether the word “application” still makes sense here.

InfoQ: So what’s the next step?

Schmidhuber: Kids and even certain little animals are still much smarter than our best self-learning robots. But I think that within not so many years we'll be able to build an NN-based AI (an NNAI) that incrementally learns to become as smart as a little animal, learning to plan and reason and decompose a wide variety of problems into quickly solvable (or already solved) subproblems, in a very general way. Through our formal theory of fun it is even possible to implement curiosity and creativity, to build unsupervised artificial scientists.

InfoQ: What will happen once you have animal-level AI?

Schmidhuber: The next step towards human-level AI may be not that huge: it took billions of years to evolve smart animals, but only a few millions of years on top of that to evolve humans. Technological evolution is much faster than biological evolution. That is, once we have animal-level AI, a few years or decades later we may have human-level AI, with truly limitless applications, and every business will change, and all of civilization will change, and EVERYTHING will change.  

InfoQ: What’s the long-term future of AI?

Schmidhuber: Supersmart AIs will perhaps soon colonize the solar system, and within a few million years the entire galaxy. The universe wants to make its next step towards more and more unfathomable complexity.

Live Stream

Here is the live video feed of the first Go match between AlphaGo and Lee Se-dol. It's scheduled to start at 4am GMT tonight (March 9).

About the Interviewee

Prof. Jürgen Schmidhuber is Scientific Director of The Swiss AI Lab IDSIA, Professor at the University of Lugano (USI) and the University of Applied Sciences and Arts of Southern Switzerland (SUPSI). He received Diploma and PhD degrees in Computer Science from Technical University of Munich (TUM) in 1987 and 1991. He has pioneered self-improving general problem solvers since 1987, and Deep Learning Neural Networks (NNs) since 1991. The recurrent NNs (RNNs) developed by his research groups at IDSIA & TUM were the first RNNs to win official international contests. They have revolutionized connected handwriting recognition, speech recognition, machine translation, image captioning, and are now used by Google, Microsoft, IBM, Baidu, and many other companies. DeepMind is heavily influenced by his former PhD students. Since 2009 Prof. Schmidhuber has been a member of the European Academy of Sciences and Arts. He won many awards, including the 2013 Helmholtz Award of the International Neural Networks Society, and the 2016 IEEE Neural Networks Pioneer Award. In 2014, he co-founded NNAISENSE, an AI company that aims at building the first practical general purpose AI.

(Special thanks to Tianlei Zhang for his support and assistance in preparing this interview.)

Rate this Article