Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Virtual Panel: Data Science, ML, DL, AI and the Enterprise Developer

Virtual Panel: Data Science, ML, DL, AI and the Enterprise Developer

Key Takeaways

  • Programs are getting more intelligent by harnessing the huge amounts of data that is available
  • From Data Science (DS), Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI) it's a huge field with some subtle and not so subtle differences between the topics
  • There are a variety of platforms that don't necessarily require an intimate knowledge of the inner workings aimed at reducing the learning curve
  • Although startups and web scale companies have already embraced these techniques, enterprises are not too far behind
  • Regarding the topic of humans versus robots it's a joint responsibility of society to innovate and develop algorithms and machines in an ethical manner

AI is making a huge comeback. It’s fascinating to be part of an era where a machine (or a cluster of machines) can take on a chess champion or a Jeopardy contestant and be able to win those contests handily.

The increased ease of availability of computing and huge amounts of data is helping immensely. In this seemingly futuristic battle of man versus machine, enterprises have realized that they are sitting on a wealth of data that has not been effectively used so far.

Whether it’s predicting buying patterns or detecting faults in consumer equipment in advance, it's clear that adapting AI techniques would yield a significant competitive advantage to enterprise solutions. The race for cognitive solutions has thus already begun.

There are many reasons why enterprises are playing catch up. First and foremost, developers consider AI in the same realm as rocket science i.e. very hard to learn and with a significant learning curve.

The traditional methods of software development break down, since a set of input(s) might yield different output(s) depending on other ambient factors, and it would be hard to do test driven development, for instance.

Finally, there is no unified set of APIs for AI, and although different platforms use similar techniques it’s hard to migrate solutions from one to another.

InfoQ caught up with experts in the field, Prakhar Mehrotra of Uber, Dr. Jonathan Mugan, co-founder of DeepGrammar, and Kumar Chellapilla of LinkedIn to demystify the different topics surrounding AI and how enterprise developers can leverage them today and thereby render their solutions more intelligently.

InfoQ: I remember when I was in grad school learning Prolog and Lisp, AI was poised to take off but that simply never happened. Why so? Now, AI seems to be sexy again. Is this a passing trend or is it here to stay?

Mehrotra:  It’s here to stay! It has always been hot in the academic community, but it has gained massive popularity in the industry due to investments made by tech giants like Google, Facebook, Amazon, Microsoft & Apple, in products like Siri on our iPhones, Alexa, Google Home and recently Autonomous Vehicles. Over the last decade, we have seen two massive technology shifts: a switch to mobile, and the advent of cloud. Both of them have led to an explosion in amount of data that is generated. The cheaper computing/hardware costs have enabled researchers at universities and in the industry to improve the performance of one of the primary AI algorithms -- neural networks. This has resulted in higher accuracy for real world applications (eg image classification/search, text mining, and voice recognition etc). The other thing that is contributing to its popularity is the cost. Tech giants like Google have the philosophy of democratizing AI (eg TensorFlow) and as result, anybody can start playing and experimenting with these algorithms. So higher accuracy of algorithms and cheaper costs of running are also reasons as to why AI is here to stay.

Mugan: The long-term trend is towards smarter machines, but these things are cyclical. AI is big right now in part because deep learning has created a new hammer, and as AI practitioners keep finding things to hit with that hammer, society sees continual benefits. Prolog and Lisp are symbolic systems. Symbolic systems are brittle because you essentially have to code in everything a computer will need to know, and there always seems to be a new case you forgot about. But symbolic systems are great at encoding prior knowledge and distinctions, and an open problem right now is how to best combine them with deep learning.

Chellapilla: AI is continuously getting better. The gap between expectation and reality is really a reflection of our optimism around how quickly science will advance to achieve our SciFi dreams. We grew up dreaming of space travel, communicators, flying cars, and HAL becoming a reality in our lifetimes. In the 1960-80s wave of AI, we grossly underestimated the difficulty of some of the AI problems. Two good examples are speech recognition and object detection and recognition. Both took multiple decades to solve. It is only recently that we’ve been able to solve them well enough to not only compete, but also to exceed natural human capabilities. These recent breakthroughs were driven by developments in greater compute power, access to huge amounts of data, and advances in deep learning. Today, the average AI developer or scientist can harness over a million times more compute (GigaFlops => PetaFlops) and a million times more data (GigaBytes => PetaBytes). This is very exciting! The maturation of open source toolkits and cloud technologies has also dramatically increased the size of the AI community and the pace of innovation.

InfoQ: The spectrum of cognitive computing spans Data Science, Machine learning, Deep Learning, AI and so on. Are Computer Vision, Neural networks and Deep Learning synonymous? Can you clarify these areas succinctly and how understanding this distinction might help developers and architects in designing solutions?

Mehrotra: Good question! Yes, in my experience in the industry, I have seen the terms AI, Machine Learning, and Deep Learning used synonymously, especially in a business settings. Actually, they are not synonymous at all.  Artificial Intelligence (AI) is a broader field of study that aims to build systems that are "intelligent".  If the system aims to mimic human intelligence and think like humans (eg passing Turing tests, etc) then it is called "Artificial General Intelligence (AGI) or strong AI". The difference between AI and AGI is the scope of the problem and modeling realm. Machine Learning (ML) is branch of applied mathematics and one of the techniques used to build an AI system. Search, Optimization, Control Theory,  Logical Reasoning, etc are various AI techniques.  Machine Learning differs from classical statistics by way of lesser emphasis on confidence intervals - the bread and butter of a classical statistician. Deep learning is one of the algorithms that comes under bucket of ML. Other examples of ML algorithms are Logistic Regression, Bayesian Networks, and Support Vector Machines, etc. Neural Network (sometimes also referred to as Artificial Neural Networks) is a loose term that in the academic community translates to feedforward neural network. When a network has many layers, it is called "deep". So to recap: AI is a super-set, ML is subset of AI, and Deep Learning is subset of ML.

Mugan: AI is about making computers smart. Machine learning is about computers that get better at a certain task with experience. Machine learning is one way to achieve AI; the other way is to program the smarts in directly. Neural networks and deep learning are one way to do machine learning. Deep learning and neural networks are pretty much synonymous. As neural networks became larger and larger, we started calling that method deep learning. Computer vision is one of the many capabilities that computers need to obtain to be smart. You can achieve computer vision with machine learning, and you can implement that machine learning using neural networks (deep learning). Data Science is about making data useful to human decision makers. It may involve AI and machine learning, but its goal is to allow a human to make a decision.

Chellapilla: Very confusing, isn’t it? They are related but not the same, though their usage might seem to indicate so. AI is the grand vision. It is the broadest of them all and includes several human programmed and machine learned areas. In contrast, Machine learning comprises a suite of tools and technologies for machines to learn from large amounts of data and little intervention from their human creator. For example, Deep Blue and AlphaGo are game playing programs built using AI. Deep Blue beating Gary Kasparov and winning the man-machine chess championship is a good example of an AI technology built using human programmed intelligence, rather than machine learning. AlphaGo, on the other hand, makes heavy use of machine learning and won its first game against a human professional player just last year (2016). AlphaGo uses deep neural networks, a certain class of machine learning models, though not for computer vision purposes. So, computer vision, neural networks, and deep learning are not synonymous. Deep learning is basically Neural Networks 2.0; bigger, better, and deeper- aka more layers. Computer vision is AI technology that allows machines to "see", i.e. gain a high level understanding of images, videos, and 3D spaces. Problems here involve detecting interesting things like faces and objects in images, being able to track them in real time, and building up a 3D model of the world around them to predict outcomes. Today, convolutional neural networks, a sub-type of deep nets, are the best machine learning models for building AI for computer vision.

InfoQ: Underlying these different areas is the availability of data. Most enterprises complain that they have so much data and they are only using just a fraction of the available data. What is stopping the usage of more data? Is there a point when there becomes too much data?

Mehrotra: Manpower! The specific kind of people who know how to make sense of data are called "Data Scientists". Turns out there is a massive shortage of data scientists. Universities are still catching up with the demand. I was reading a report on big data by McKinsey and remember it saying that by 2018, there will be a shortage of around 150k data scientists in the US alone.

Mugan: I don’t think it is a question of too much data. The challenge is getting the data formatted so that it can be fed into computers. And it is hard to come up with algorithms that make the best use of it. Once you have the right algorithm, you never seem to have enough data.

Chellapilla: The biggest blocker is the quality and accessibility of the data. When enterprises say they have a lot of data, they are usually referring to raw data which is very noisy and is not in a state that is ready for consumption by today’s programs and programmers. Companies need to invest data science and engineering resources to build out pipelines and platforms that digest this raw data into structured and semi-structured forms that can then be ingested by traditional programs. These transformations typically make heavy use of statistics and machine learning algorithms. Once companies put in this investment, there is no such thing as too much data.

InfoQ: Is the learning curve for AI too high? What are the AI platforms and APIs that do a good job of abstracting the complexity that developers and architects should pay attention to? was formed by a bunch of vendors and 3rd parties to collaborate on AI. Can you comment on this effort and other efforts to simplify the lives of developers?

Mehrotra: It depends on the kind of AI system one is envisioning to develop. For example, if you want to build a AI to play a game of chess or to find the optimal/shortest path or to decide whether to give a loan or not (credit-default recommender system), the hard part is not the science, but the computing power. So in that sense, the learning curve is not steep. However, if you are building a system that will decode a new language for you or you want to build self driving cars, then yes, I would say the learning curve is steep. For instance, in the case of language processing, we can use statistical techniques to decode and understand a part of the sentence and patterns in it, but in order to interact in that language we need to understand context - something that humans have mastered but machines have yet to catch up on.

In terms of the platforms that abstract complexity, IBM Watson stands out. It has done an amazing job for developers outside silicon valley to quickly jump on the machine learning band wagon and adopt it for their specific use cases. Another good API that I am exposed to for pattern recognition is Google Prediction API.

Mugan: There is a lot to learn when it comes to AI and machine learning, but there are a ton of resources online to get you started. Many times, you can get a lot of benefit with simple algorithms. We seem to want to always start with the most powerful deep learning algorithm, but that usually isn’t required. Simple algorithms like decision trees work great in many cases. I can’t comment on specifically.

Chellapilla: The learning curve for building AI solutions is getting less steep. Several basic ingredients for building AI systems are widely accessible via open source packages and cloud APIs. Good examples here are Google and Microsoft cloud APIs that provide text, image, video, speech, and machine translation capabilities. These are advanced enough for engineers to quickly build simple AI powered user experiences and products. However, many products require either custom engineered AI solutions, or even AI research and breakthroughs, to be successful. Once again open source tools such as R, scikit-learn, TensorFlow, etc are helpful here, but require non-trivial skills to master.

The hardest part of the learning curve for developers is changing their design thinking. This is needed for building non-zero fault systems that only offer statistical performance guarantees. In addition to understanding APIs, an AI developer needs to also grok math topics such as probability, statistics, and optimization. This is necessary for iteratively improving their AI system to make fewer mistakes. Most importantly, they have to unlearn the gut instinct to add a rule to fix an error the way they would fix a traditional software bug.

InfoQ: Supplementing these APIs, what are the AI related development tools and toolkits that Data Scientists and Developers should be aware of when working on data science projects?

Mehrotra: TensorFlow from Google, Theano library in Python, Deeplearning4j for folks who use Java and JVM, Packages rpart, glmnet in R are also useful for quick hacks to test an idea.

Mugan: I live in the Python world. I use a lot scikit-learn and TensorFlow. For natural language processing, spaCy is a great choice.

Chellapilla: For someone starting out, I’d suggest picking up R, scikit-learn, NLTK/spaCy, and TensorFlow. For intermediate to advanced users, there are a myriad options out there. It is a very rich and dynamic ecosystem that continues to evolve.

InfoQ: It's always fascinating to predict the future, such as buying habits, maintainability issues and so on.  Can you comment on the advances in this area? How sophisticated are these algorithms and methods today? How can developers and architects incorporate them into their solutions?

Mehrotra: Neils Bohr said, "Prediction is very difficult, especially about the future".  The hard part about prediction, when it involves a human, is the context. For example: I order Pizza from say, the same pizza chain every Friday, around 8pm. I have done so for the last two Fridays, so a recommender system thinks that I will do the same this coming Friday. A data scientist at this chain has two choices: he or she can use this information to send me a pop-up on my mobile screen, or it can automatically order a pizza for me and send me a notification that a pizza is on the way. Well, if I really wanted pizza this Friday also, then Voila! This is an amazing AI that can predict my behavior. However, if I already had pizza say, at work, and I am in no mood for it again, then I might not be happy and start questioning the AI.  Hence, most companies resort to a financially less severe alternative: pop-up on mobile screen.

Regarding the sophistication of algorithms, in my experience it has been more about a stronger signal than a sophisticated algorithm. For example, ensemble of logistic regression and random forest might do a good job in predicting loan default probability vs complex neural networks if my training data does not suffer from class-imbalance problem (aka I have enough default cases in my data I am using to train the algorithm). Better data triumphs complex algorithm.

Mugan: Recommendation engines try to predict what we will want next, such as on Amazon or Netflix. It’s not clear how far we can take these algorithms. If you remember the 1997 movie The Game with Michael Douglas, they could predict his every move. I don’t think it’s possible to predict individual human behavior with that kind of precision.

Chellapilla: Predicting behavior of specific users is hard, actually quite hard. However, understanding and predicting trends and behaviors of cohorts of users is quite advanced. Online advertising and revenue forecasting are the state of the art here. These ad matching and serving algorithms rely heavily on user behavior signals. The more the user shares with these algorithms (e.g. your Amazon Prime account purchase history, or your YouTube viewing history), the better they become. From an advertiser’s perspective, these algorithms are good at predicting audience sizes, campaign spend rates, and providing insights like on how much advertisers should bid. AI tech-wise, these solutions require robust algorithms that not only leverage historical data, but also respond quickly to changes in user behavior online. While the models are simple, these requirements produce quite sophisticated implementations with combinations of cold start, warm start, and real-time learning to track intra-day trends in online purchasing behavior.

InfoQ: Many startups are already leveraging Deep Learning. Why is Enterprise adoption still relatively slow? Can you provide some details of well-known or internal enterprise application(s) that were significantly improved as a result of incorporating some of the AI techniques?

Mehrotra: The reason why they are slow is because the platforms that I have described above offer easy implementation of basic techniques. Companies use this, but quickly run into issues (eg a lack of data science department) of modifying them to suite their specific needs. The key to the success of any enterprise solution is support, and I think Cloudera and Hortonworks are out there in this area, overcoming these barriers.

Mugan: Companies both big and small are using deep learning. Deep learning is currently best for applications related to perception, such as computer vision. Most companies deal with more structured data, and it is still unclear how to leverage deep learning to gain insights from a large SQL database, for example.

Chellapilla: Deep Learning is quite widely used in enterprises. Startups tend to produce niche products and typically build solutions from scratch. So, it’s reasonable that they directly start with deep nets. Enterprises building data products usually already have existing AI and ML solutions in place that are doing well in their domain. The KISS principle dictates that if something simpler, like a logistic regression model, is already in production and is doing well, there is no need to look for more complex models. So, you’ll see enterprises use deep nets more judiciously and apply them to problems where there are promising gains. Enterprises are slow in one regard though, due to data scale and privacy. Unlike startups that can leverage cloud APIs and services from day one, enterprises might have data scale and customer privacy constraints that don’t allow for shipping their internal data over to cloud services as easily. This requires them to build a custom solution in-house, which can be slower. They also need to solve scale problems early. An enterprise building a deep learning solution in-house that scales to hundreds of millions or billions of users will understandably take longer.

InfoQ: Can you discuss some best practices for developers who are working on Deep Learning or AI projects?

Mehrotra: Scope out the problem and clearly define what you want your system to do. Have a hypothesis on the outcome. On engineering side, make sure you have robust data pipeline and that it remains solid throughout the course of your project. If you don't have data or cannot reliably get the data, you will end up taking time fixing that. You want your data scientists to spend time on science, and not on maintaining data. On the project management side, start with a simple objective. Demonstrate this to your stakeholders and win early trust. Once you do that, go back and start investing in feature engineering. As Martin Zinkevich from Google says: good gains come from good features and not complex algorithms!

Mugan: Start simple. First use the simplest possible algorithm and run the process all the way through and evaluate the results. After you have done this, you can iteratively improve the results with more sophisticated algorithms.

Chellapilla: My suggestion would be to start by understanding deep learning, neural networks, and your specific problem domain. Deep learning is one approach to building AI solutions. It is just as important for AI developers to understand other techniques like linear models, logistic regression, trees, ensembles, etc. Whether deep learning is the best option or not depends a lot on the problem you are trying to solve and the desired properties of the AI solution. Often domain expertise and simplicity trump other factors. Simple models are easy to debug and tune. Product insights and experimentation is valuable for engineering good features and is as important as picking the right model and learning algorithm.

InfoQ: Final Question. Can you paint the future of AI? How would you assuage the concerns of the extremely paranoid that machines will eventually take over the world and make human beings expendable?

Mehrotra: Humans should build AI systems that are helpful for the human society at large. Machines don’t take over the world nor will any transformers. No missile or nuclear bomb by itself will wipe out the world. Rather, it is all in the hand of humans and their judgements to use the missile or not.  I think the paranoia of machines taking over the world may have its origination from the theory that machines (by way of automation) will replace jobs, and hence humans translate that into fear of overtaking the world.  The 2017 ASILOMAR conference did a very good job of laying down the guiding principles for AI research and ethics.

Mugan: It seems impossible to me that machines will ever be smarter than humans, but it also seems inevitable. Machines get smarter every year while we stay about the same. We fear that machines will take over because that’s what we would do. But machines didn’t evolve through evolution’s tooth and claw like we did; they were designed to help. Overall, I fear humans more than machines. Our society has a lot of problems that need to be solved, and I hope that machines can help us prosper. I also hope that we someday have computers that are smart enough to explain the mysteries of the universe.

Chellapilla: Fear is the wrong emotion to bring to AI developments. Today, it is more of a reflection of our trust in machines than the dangers inherent to AI. Humans have always been tool builders and tool wielders. Over time these scientific and technological innovations have brought a lot of prosperity and happiness to humans. I believe this will continue to be the case. AI will make these tools much more versatile, independent, and powerful. That said, with this increased power comes increased responsibility. Power is intrinsically neither good nor evil. So, as we build more powerful AI systems, the onus is on us- the creators- to also ensure that this AI not only inherits human attributes like raw intelligence, but also our social emotional values. The latter is key for producing a socially responsible AI that humans can learn to fully trust.


The opinions above have enough diversity and lot of commonality. As someone who’s getting started in the field, the terminology is too overwhelming and the distinctions between Data Science, Machine Learning, Deep Learning and AI is not always clear. However, it may not matter significantly since the approach is similar -- to fine tune algorithms based on a continual feedback loop.

Most panelists agree that the learning curve in this area is getting significantly less steeper and there are a number of platforms to get started relatively fast. Although startups have embraced the techniques outlined in the panel already, enterprise adoption is not very far behind.

As most panelists recommend, start off small with clearly defined requirements, a model and a hypothesis. Validate the hypothesis and continually fine tune the model based on the outcome(s). It might help to understand some of the inner workings of these algorithms although today’s platforms make it easier for developers and architects to play with these algorithms and gain a better understanding by trial and error.

On the topic of AI superseding human intelligence, the panelists unanimously conclude that we the builders of these intelligent systems have an ethical responsibility as well to ensure that if and when it does happen the power of these systems are harnessed for the greater good. Like a lot of other similar issues, it’s up to society to define these bounds while the technologists continue to innovate at a frenetic pace.

About the Panelists

Prakhar Mehrotra - is currently Head of Data Science - Finance and Uber leading team of researchers. His research group focuses on forecasting, optimization and building simulations to better understand network effects in a marketplace. Prior to Uber, he was Sr. Quantitative Analyst at Twitter as part of Sales and Monetization team working on building forecasting algorithms to predict revenue. He has graduate degree in Aeronautics from California Institute of Technology, Pasadena and from Ecole Polytechnique, Paris.

Dr. Jonathan Mugan is co-founder and CEO of DeepGrammar. He specializes in artificial intelligence and machine learning, and his current research focuses in the area of deep learning, where he seeks to allow computers to acquire abstract representations that enable them to capture subtleties of meaning. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin. His thesis work was in the area of developmental robotics where he focused on the problem of how to build robots that can learn about the world in the same way that children do.

Kumar Chellapilla - is the Head of Monetization Relevance at LinkedIn and works on improving relevance for LinkedIn's paid products such as Recruiter, Premium Subscriptions, Ads, Sales Navigator, ProFinder, and Referrals. He is passionate about driving product and research innovation by solving hard problems in the areas of machine learning and artificial intelligence. Prior to LinkedIn, Kumar worked on Ads Quality at Twitter and Web Search at Bing and Microsoft Research. He has a Ph.D from the University of California, San Diego where he worked on training neural networks for game playing.

Rate this Article