BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Document Digitization: Rethinking OCR with Machine Learning

Document Digitization: Rethinking OCR with Machine Learning

Bookmarks
41:48

Summary

Nischal Harohalli Padmanabha outlines the problems faced building DL networks to solve problems in the information extraction process at omni:us, limitations, and evolution of team structures, engineering practices, and other topics.

Bio

Nischal Harohalli Padmanabha is currently the VP of Engineering and Data science at Berlin based AI startup omni:us, which operates in the building of AI product for the insurance industry. Previously, he was a co-founder and data scientist at Unnati Data Labs.

About the conference

QCon.ai is a practical AI and machine learning conference bringing together software teams working on all aspects of AI and machine learning.

Transcript

Padmanabha: I'm Nischal, I'm the VP of engineering and data science at Omnius, a Berlin-based AI company. I think this talk could be slightly different from everything that you've seen in the last two days because you're working in a very different space. We're an AI company that's actually helping insurance companies automate their claims handling. What does this really mean?

Before we get into what it really means, this quote, for me, really stands out well because about two years ago when I joined Omnius, we were just a team of three people. Some of the first things that we actually ever did was to start writing something with rules. This quote by Geoffrey Hinton is something very important for us because every time you get into trying to solve a problem, as a human, by nature, we want to have everything in our control and the first thing we do is writing rule.

Writing rules is not bad because it gives you a baseline of hope for what your system can actually be and then you can, from there on, figure out how you want to evolve and go ahead. What is one of the problems that we're trying to solve at Omnius in the claims handling? We are trying to understand unstructured documents and extracting semantic information to automate claims handling. Some of the confusion usually is when someone says document digitization and they're going to a digitization process, it's not merely just running it through an OCR system because what happens if you send a document to the OCR system is that it just adds another layer on top where you could copy paste something from there. It doesn't really digitize your purpose, nor gives you a handle to understand what is there in this document and how do you want to process this.

Successfully if we solve this, this is what it would look like, if someone's pushing in a claim through one of your insurance companies, and I'm sure some of you in the audience here have all tried to settle a claim at some point in your life, you send your policy, you send your invoices, you send your medical bills, or if you have a car accident, you send details from a car repair shop. Actually, even now, a lot of you I'm sure, you've sent it physically. Have you ever thought what happens behind the scenes? Behind the scene, it actually goes to a sorting center. Someone manually takes this information up, figures out which department it needs to go into, and someone in that department picks this document up again, feeds it into some form of system, and then it goes to another department and another department and another department.

Then finally, someone contacts you and says that you've not submitted an invoice six weeks later, and then the process takes another six weeks before you actually have your claim settled. Now, we are trying to change this, we are trying to change how insurance companies, being so process driven, can start being data-driven. In order for us to do this, we are achieving this with AI.

Rewind

I'll rewind two years back on how we started off and this was the conversation that I had with [inaudible 00:03:12] as well as in terms of how historically we changed from what we are trying to do and where we are today. In this entire process, I would like to take a small fraction of it, which is tabular information extraction. It seems straightforward, when you look at tables, extracting information from tables is super easy. From this UI, even if you just look at this, the construction of roads, the construction of columns, what exactly is a column? Information, text wrapping is different. Imagine every car repair shop, every medical, every hospital, every clinic in the world or anything that gives you an invoice or a document, they have their own way of representing tables. There are no columns, there are no rules, there's information that's present somewhere in the document and you want to extract this.

What did we do? We were super arrogant engineers. We said, "You know what? This looks like an easy problem. It's so funny that people haven't solved it. Let's write a bunch of rules." So we wrote a lot of heuristic rules, the initial results were awesome because we were just testing on the data that we had and we kept a few samples aside. On this evaluation, we were like, "Ok, we're confident, and it's so surprising that someone's not cracked this yet and someone's not commoditized."

In reality, however, when we gave this to our customer and they ran it through our system and we saw the results, they were like, "Are you sure you really solved the problem? You made it worse for us because it would take us lesser time to actually have a human go through these documents rather than use your system. The idea was you would give us a system that would help us automate things and make things faster."

We felt miserable, we failed and we didn't know what else to do, the first action for us was, we said, "Ok, we're going to write a few more rules, we're going to try and see some of the edge cases that we want to handle." and at a certain point in time we realized, "Oh my God, this is just cumbersome and brittle." We had a few consultants who were working with us and we were just a team of three people and they left. We didn't even understand the Java code that was there, and we were like, "Oh my God, we can't do this anymore."

This was also a life or death situation for us because, for the project and as well as engineers, because two years ago, we were still a seed-funded startup and we were trying to win our first customers, and it's very important that we really get this right.

What did we do? We took a step back and said, "Look, there are two ways to go about this. One, we could hire 20 engineers and keep building rules, survive for the first year somehow, and then figure out what we want to do. Or, let's think about how a human would solve this problem and see why we can't write systems in order to do this and actually be happy if an algorithm wins over a human"

If I were to give or hand in a document to you guys right now, the first thing that you would normally do is just look at this document irrespective of what language it is there, and your brain's already analyzing, "Ok, this looks like a paragraph, this looks like a table structure, this looks like certain information that's important. This is a header, this is a photo," irrespective of what document we give you. Then, bringing in some more domain knowledge, and you know what you really want to extract, then you know from given this context, what is the information that's relevant to you in order for you to understand what these documents are all about.

We took the same approach and from here on, it's straightforward, we use computer vision, we use natural language processing, and boom, the problem is solved. In parallel, what I want to also show in this presentation is how our tech stack as an AI company you are. We did a bunch of things in Python, we were doing Bash magic, we have a crazy engineer whose Bash skills are unparalleled and it's very hard for any office to understand what's going on, but it does a lot of cool things, and then we wrote a lot of pipelines in Java.

Next Steps

Now, the next steps. Given that we want to solve this with computer vision and natural language processing, and I'm sure a lot of you go through the same thing, you go online and you type "state of the art computer vision algorithms," and you would get, I don't know, 300 million hits of what computer vision is. You are lost on what you want to use because it's not like you can search and train every algorithm and figure out what's going on, and the same way for natural language processing. Even if you do figure out what you want to run, what is the data that actually goes in? This is something that becomes very hard, because you look at images and images are the only things that people are working on or they're working on free text.

How do you combine something that's imaged, that's free text, put all of these things together, and figure out what is the label data that you really need and how do you feed this data into your systems? Given a startup, you're always running against deadlines. You want to do this not in a 10-year research project, but you want to sort of see how you can do iterative steps in research and take it into production and meet your deadlines.

Something that was also very important and needs focus and attention is the human and computation resources. You just don't need engineers, but you also require a lot of computations resources, especially when you want GPUs, when you're running multiple experiments in parallel. How do you do this? Putting all of these pieces together, how do you agile this? How do you take things forward?

Let's go one step at a time. What algorithms to use? From the beginning, we decided unsupervised way was not for us because if we didn't know what we wanted from the system, we cannot expect the system to tell us what exactly is right or wrong. However, one of the data scientists on our team came up with an idea. He said, "Look, we cannot use unsupervised systems simply because we don't know what we want as an outcome. However, we can use unsupervised systems to actually generate training data for us." One of the first processes we do in terms of extraction of information itself, is to actually understand what this document is. Is it a claim? Is it an email? Is it an invoice? I don't know how many of you attended Joel's keynote yesterday - he was talking about contextual vectors.

What we did was we took big-ass models that have already been pre-trained on something, just ran our documents through it, both the computer vision side of it and natural language processing side. We took one of the layers and basically, we did very simple k-means on top of it. We had a situation where we had to classify 450,000 documents into 30 different bins and if we wanted to give this as an annotation job to somebody, it would probably take us about two months to get it done.

What the unsupervised algorithm did was it really didn't give it labels. We knew how many classes we required, taking this unsupervised way, we generated buckets where we saw the homogeneity of the clusters and we used this information to feed into the supervised learning mechanisms and we do the same thing with both computer vision and natural language processing. This speeds up the annotation, it gives you an idea of what your document is, what your information is, and how you want to take things forward.

On the supervised learning side of things, as we decided both computer vision and natural language processing, on the computer vision side, I'm sure everybody here in the audience has heard about autonomous cars. We thought if cars can detect traveling at 70 miles an hour, 100 miles an hour- everything on the street in terms of, this is a car, this is a bird, this is the road, and this is the lane- we thought why can't we do that on documents? If we can train systems to actually do object detection on pages and say, "This is a paragraph, this is a table, this is a header, this is a photo," We started seeing that this was working really well for us and we didn't even have to take in big networks.

We took one network called SqueezeDet, which is actually a very fast object detection network, compress it a little more and remove some of the layers that we thought are not required, and we were able to run successfully on hundreds and thousands of documents for object detection after labeling in a few hours. In parallel, we started doing research work as well, because this was a short term, long term trade-off. On the short term, we were very happy with what object detection was doing, but we also saw some of the limitations it had, especially when you have documents that are skewed, documents that are absurd in nature in terms of the scan; let's say, if somebody's taken a WhatsApp photo and then there's a reflection on the document and you can't really see the text or the information that's there.

We started developing on this thing called message parsing networks, where we take a different take on how the document is structured itself as an interconnection of nodes and we have a deep learning system that learns the relationships and identifies what these nodes have to be to figure out if it's a paragraph, if it's a passage or different other elements on the screen. Of course, we wrote a few customs CNN implementations.

However, on the natural language processing side, given the advancement that's there, we still couldn't find anything that we could use out-of-the-box. Yes, the base for all the natural language processing systems and deep learning is quite well established now with contextual vectors and with a lot of different things that are there. You have ELMo embeddings, you have BERT embeddings that you could use, you could also use FastText embeddings, these things are cool, but they don't really solve your problem. This led us to writing our own implementations of RNN-CNN networks and we also have a few implementations of Deep Topic modeling in place. We decided we want to go down this path until we actually figured out if there's something that we can use out-of-the-box.

How many of you here know Richard Socher? For those of you who don't, he is the VP at Salesforce, he was running his own company in Palo Alto, based out of Stanford Labs, and he's one of these guys who's really famous in the natural language processing space, working with Christopher Manning and everybody for quite some time. This is also something where we realized this is the right way for us to go because we were not really sure in unsupervised systems.

We said, "Ok, we want to do supervised learning and we are very sure about it." but in order to do supervised learning, you need to label data. How do you label data for something that is complex, something that both you need the computer vision aspect of it, you need the text aspect of it, and you want to combine this together? For the computer vision aspect, you want to be able to draw polygon bounding boxes, you want to be able to label pages, you want to be able to label documents. Whereas in the natural language processing side, you want to be able to have hierarchy of information, how information is grouped together, how you connect the dots, and what is it that you're really trying to extract? We decided we'll build our own annotation system. We really tried to search and we adopt most of the open source technology as much as possible, but we couldn't find one that was sufficient enough for our needs.

A lot of them have capabilities that work only on the image space of things and not really on the tech side, so we decided to build our own annotation system and also to support workflows around this. As you have annotation jobs, hundreds and thousands of documents that someone has to manually annotate, you need a lot of workflows around this as well. This is already in its third iteration now and I think in the next six months or so, we'll probably open source the entire annotation system that we have.

Human and computation resources, we wanted to hire data scientists, engineers, leaderships and mentors, and also be part of cloud startup programs. In the data scientists segment itself, on how we wanted to hire, we already hired a few people from the academia and we decided we want to hire the rare breed of deep learning engineers simply because of us trying to gain speed and agility in our experiments that we want to do and how we want to take these things forward.

We also started tying up with research programs with universities. Something that really worked out great for us is, last year, we offered two students their master thesis sponsorship on research work that they could carry out at Omnius that had a direct impact on the problems that we wanted to solve. This worked out great for us for two reasons, one, their contributions were immense and this basically propelled us to the next stage with the focus that they had. Two, they really loved working there in the problem, so they immediately got converted to full-time employees thereby reducing the time for onboarding, understanding the culture, and sort of already knowing what problems to solve. This was a great thing for us that really helped.

In parallel, something that we paid attention to was, especially in working and trying to build up product, you don't really focus on trying to sell AI. You have to sell the product, and the product can have AI or not, because insurance companies or anybody does not really care if you have 100 million people doing these things manually or if there's an AI system at the helm of it. From the word go, we knew we had to build a lot of engineering around things, so we hired DevOps, data engineers, full-stack engineers. As well as on the leadership side of the entire hiring, we knew we had to get people who had an understanding of AI to lead the teams on both the ends, and also convincing people in the industry for mentors.

Something that was wonderful for us, and accidental because we didn't really think about it when we were getting in, is we applied for all the cloud startup programs that were for there: AWS, Azure, Nvidia, Google Cloud, we were like, "Ok" and we didn't really have a reason why. We thought, "Ok, we're a startup, there are startup programs that are there, let's just go and get enrolled." That was probably the best thing we did because we got not just a lot of credits, so we could run massive amounts of experiments overnight using all the GPUs, but we also started getting a lot of help from the engineering teams that work at these organizations.

There were masterclasses organized by Google and there was a lot of connection with the AWS folks who are not really trying to just push you to using their cloud; they actually help you with the engineering issues that you have and give you best practices and ways to scale systems. We put all these pieces together where we introduced sprint planning for research and this worked great for us, the idea was to just have a quick turnaround of POC. How we brought in Agile to the team was we adopted a flavor that works for us, not really something that's by the books for everybody else. Our main goal was we engineer AI so that we can keep running experiments in a systematic and an automated way, giving us a lot of time and room for research.

The result, the first delivery, when we did all this and we had the long-term vision and we were starting to hit all the short term goals as well, we hit about 94% in production and accuracy with a system that was absolutely built with no rules, everything that was running. We had a bunch of different AI systems grouped together and everything was running really well. Yes, we were super happy, the whole point was that even if we did fail at certain things, it was like a continuous learning system. We gave the customer the ability to correct the mistakes the system did, sort of a human in the loop system and there will be continuous improvement in what the system will do as well.

At this point in time, we started working heavily with TensorFlow and a lot on Google Cloud. Something that we realized initially itself is simply because of the dependencies that we had, when we started containerizing the AI experiments and the training that we do, so that we could just run it on any infrastructure very easily.

Go Live or Go Home

The next question was, we have successful proof of concepts now, a few customers are happy, we have to go live. A very important thing for us was, even though we ran these experiments and proof of concepts on cloud, by default, insurance companies do not want to go on Cloud. Even if they do, it's something that they're thinking about five years from now, they're evaluating a lot of privacy issues and concerns and different aspects of the system, which means from the very word go, we needed to build in capability of providing training, prediction, management, visualization, the feedback loop, everything to run on-premise.

This is a whole other sort of a problem because, especially when you're at a smaller company, on-premise feels hazardous because it's very scary, you have absolutely no control, and you don't know what's going on their system, and you don't really have the bandwidth to keep sending your DevOps engineers to sit there for days, in different parts of the world, to make things happen. We circumvented the problem and we found a solution that would work seamlessly for u, I'll talk a little bit about what we did here.

The first thing that we did was we always introduced our product as a human in the loop sort of a product, where we said, "Look, the AI systems can do a lot of heavy lifting, but it will make mistakes." If you were spending 30 minutes on a document, you end up spending two minutes now because we do everything for you, but you're just looking and validating everything that looks right and some things that are not “Please fix it”, and the system would learn again. What we learned to not ignore over the past year and a half is, no matter what anybody tells you, if you are solving generic things with AI, with using a Wikipedia corpus, with using a Twitter corpus, you can solve a finite number of problems.

If you really want to solve a business use case for somebody, understanding their domain is very important because you cannot take, for example, a Wikipedia corpus as your contextual vector to understand insurance documents on how a policy would look like in Italy, because the words don't exist, the way it's composed, the text that's composed don't exist. Without having domain knowledge, you don't really know what you want to extract, what you want to make sense of, and you don't know if you're solving the use case. We built a lot of domain knowledge into our networks, into our data, into the way we do things as well.

The second aspect was educating customers on AI, this was crucial for us because customers also have their chief innovation teams and they're looking at a lot of AI stuff as well, but when they go and read about things and meet other people, AI does everything for them and they're always constantly surprised why systems are not 100% accurate. This is something that was very important for us to educate our customers on how these AI systems work.

Engineering this - when I say engineering, it's really building a platform for how your internal stakeholders work, how your data scientists work, how your data is stored, how do you push things into production, to have a CICD for working and experimentation itself is very important. That takes you quite a long way, in order to achieve this, what we did at Omnius was as the level zero product, not even a product directly to the customer, but as an internal product for ourselves, we built a platform that had three main verticals. The first one is a training platform that enables us to train and retrain models given a data set, because we have standard preprocessing mechanisms to do, we have a way to work with training these models itself, depending on what we want to solve. Once these models are trained, we move these models into the prediction system.

The prediction system is the one with the human in the loop where you have pipelines that predict on the data and the human in the loop can take a look at it based on confidence value, based on documents scoring, if there's human intervention required, and fix if there are issues, and then this would be used for retraining the model again.

The third aspect of it was management of everything, you manage how your models run, you manage how your training is running, you manage versioning of your models, and you're looking at user authentication and console for the entire platform that's at play.

Our training platform was mainly two different aspects, one was the annotation system which we built in house. As a tip, for any of you working, irrespective if you're working with structured data, unstructured data, documents, images, text, everybody, I'm sure here version controls your code. I would definitely suggest version control your data, this is very important when you want to go back one year from now and look at a paper that your team wrote and said, "Ok, we run this experiment on this data. Can I do this again?"

If you don't version control your data, if you don't version control your model along with version controlling of your code, then you can't really go back in time and do anything. Something that is quite important that we are doing at Omnius, is we are version controlling everything, so we can go back, run experiments, even version controlling our annotated data and the job itself.

The second part of it is we have the ability to train and evaluate data sets as a proper independent task. Given a data set, the modules run and they push the metrics and they get version controlled, and you can look at what these metrics are, and I will show you what the platform looks like and also how our current system is, this is something that you could do on any infrastructure. When you run these training models, you don't really require the training infrastructure to be present all the time. Only when you require the training, you can bring up the infrastructure, these models train, and all of the metadata and the models and the artifacts are pushed to an artifact store, and then you could use them if the performance is improved, or you could start evaluating what is going wrong.

On the prediction side of things, we built four main components, one was an Async API for ingestion of documents. Why an Async API? You get to throttle, not the requests that you receive, but how your downstream systems actually work. Based on the size of the infrastructure that you have and based on size of the infrastructure that customers can actually afford, you control how many documents you can process in parallel. However, this doesn't stop you from accepting a million documents a day, and then use streamline how much ever you want and thereby helping you to ensure that your systems just don't die when there is a huge traffic that's coming in.

We've built a validation user interface for fixing prediction editors and also looking at what the model is really trying to extract. We worked a lot on AI microservices, unlike having microservices just for our engineering aspects, we did stateless AI systems where we connected output of one AI system to the output of another AI system just as a microservice HTTP request, sharing artifacts as payload. This helped us a lot because natural language processing systems, for example, don't really require heavy-duty preprocessing when compared to image processing, where you're trying to bring an image to a standard form, you're trying to fix the aspect ratio, you're also running something to fix the rotation and the shear and doing all these transforms, and you want to scale this microservice horizontally, whereas you would not want to scale something that does not really require heavy computation.

Initially we spent quite some time trying to write our own data pipelines in Java and we realized it's a dead end, because it's a lot of engineering effort and a lot of bookkeeping that would take us quite some time to have good managed data pipelines. We ended up making two choices, one, we tried out Apache NiFi in a way to orchestrate how data pipelines actually work, because Apache NiFi provides you the capability of dragging and dropping components with the user interface. However, we found that to be a little complicated for our team because our team was heavy Python-focused team, whereas, for NiFi, you really require a Java guy all the time.

We decided that's not a part for us because then you would need to build engineering around the data science team and not actually build a team that can work and contribute on data pipelines together. We chose Airflow; it's an Apache incubator project now, this was something that came out of Airbnb, it's actually quite fantastic. There's also something else called Luigi by Spotify, with Airflow, it basically gives you the capabilities to have a high throughput system and managing parallel activities with the user interface, with having retry mechanisms for your job, where you don't really require real-time processing that needs to be in place.

The final aspect of the platform was to have a management console, for this management, I'm talking about configuration management. You have so many different systems and all of them have different types of configuration and you need a central place to manage all the configurations for your Docker containers, for your Helm chart, for your AI microservices, for all the heavy lifting preprocessing pipelines that are there and also the API service as well.

The second aspect is the user management, which is not really how just systems from the outside world talk, it's also how your internal systems talk to each other, especially when you're dealing with something that is as privacy-concerned as in the insurance industry. You would not even want your microservices that don't have certain authorization to do something, perform a process, where if someone tries to get an access into your system, so you have A&A between services, as well as users as well.

Two other aspects are your infrastructure logs and application logs. Yesterday, I was in one of the talks where I don't really remember the speaker's name, but he was talking about Prometheus, Grafana, and the ELK aspect of it, these things are very cool and this is something that we've integrated as well.

At any given point in time, we have metrics for everything, we have metrics in terms of understanding how our training is being triggered. If infrastructure is going to go down, what's the application logs that are coming in during the training? What are the artifacts that are being generated? It's not really something that you want to monitor only when you go live. You want these things to also be monitored when you're training so you can understand the sizing of your infrastructure and how you can package all these things in a seamless way.

Tech Stack Check

Our tech stack group, currently, this is what we are running. I'll go from left to right, on the left, all our AI microservices. We are using a microwave framework called Flask, for a lot of you who don't know how Flask started, it was basically an April Fool's joke where Armin Ronacher took a bunch of different Python scripts, put it in one single file, and said, "This is a cool microwave framework." That thankfully got picked up as a proper project, right now, Flask is something that's really well made and maintained. What we also realized was - and this is probably getting into a little bit of a controversial discussion - is we started migrating from TensorFlow to PyTorch.

We have a few strong reasons for this, our team is a combination of deep learning engineers, full-stack engineers, and data scientist from the academia. What we realized was, especially with scientific packages in Python and the ease of actually focusing on simply writing your networks and having them run on a large scale infrastructure, PyTorch just seemed seamless. It's a code that everybody can understand, it doesn't have weird checkpoints in place, you don't have to specify 100 environment variables to get things done and your code sort of starts to become more meaningful.

It is taking us a little bit of time because we're transitioning all our systems from TensorFlow into PyTorch and we are also measuring the baseline, but this is something we are really happy about. It's given us a lot of power, especially in the way we are configuring things and running. Definitely, take a look at it, if you're just looking at TensorFlow and you're finding it cumbersome, PyTorch would definitely be a good alternative, especially if you want to do deep learning.

In the center here are all the systems that we can actually work on, we're running on Google Cloud, we're running on AWS for different customers, we're running on the Azure platform, and of course, a lot of our customers, about 75% to 80%, are on-premise. The way we can achieve this is because of the two things that are below, Kubernetes and Docker. We've convinced our customers to install Kubernetes in their infrastructures, and if they need help in installing Kubernetes, we provide all the help and guidance that's required.

This has changed everything for us because we basically deploy, manage, and take care of things seamlessly in every given environment in the same fashion. We don't have weird things for AWS, weird things for Google Cloud or Azure or running things on-premise. We are setting up an infrastructure in our office itself where we are running different Kubernetes installations. One other cool thing about Kubernetes is you can run Minikube on your laptop, so if you want to have CICD in place and your engineers are trying to test out and build something, they want to immediately push their Docker container to see how the pipeline and torrenting works without pushing it to a CICD servers, and then waiting for the Docker container to get done and then someone has to fix some code again and push this, you can run it locally.

On the left-hand bottom are some of the things where we're using GitLab to do our data versioning, we're using the ELK Stack for application logging, we're using Prometheus and Grafana for our infrastructure logs and monitoring and setting up dashboards, and we're using Keycloak from Red Hat. It's an open source system that's enterprise-ready, which gives you straight off the bat authentication, authorization, or user management and all these different things.

On the right bottom corner is MLflow, which we are using right now from Databricks to do R module versioning, management of matrix, and a console for our AI modules itself, I will show you what this control looks like. We're using Airflow, as I mentioned, for our data pipelines, and we're using Metabase, which is an open source BI sort of a dashboard that you can run in your infrastructure for generating business reports for the business users on your platform. This is what our console looks like and this is a screenshot of it, I will also show you by processing a document on how our systems work now.

Learnings

Some of the quick learnings that we've had is, as an organization it's very important to believe that AI can solve problems. AI is no more a buzzword; it can really do a lot of wonderful work, but you have to bake engineering into your product. You can't just believe that AI systems do everything and not think about engineering. Agile with AI works, but it just takes a little bit of time to figure out what your team really needs and how they need to function. Please pay attention to domain knowledge and detail and understand your use case on what you want to really solve and not just look at a data set and try to publish a paper, especially if you want to solve a business problem.

Don't try to use one hammer for all, that's something that we did as well. We initially decided we do everything with computer vision and then we said, "No, it doesn't solve all the problems for us." It's good to combine not just AI systems, but also other old school engineered systems which are quite good at what they do. Believe in putting human in the loop, it builds trust with the business, don't try to oversell that AI can do 100% and you don't need anybody to do manual work anymore, that doesn't work.

Education of internal and external stakeholders is important, visualization of your AI modules is super cool, use it. It gives you a very big capability of understanding and explaining this to, not just your business stakeholders in your office, but also to your customers. AI is no more a black box, you can build things one step at a time, you really get a lot of control into what these systems can do. Automate your process as much as possible, because that gives you more room for research.

I'm going to upload a document that's a little anonymized because, I can't really use it. This is a bunch of AI systems that are running behind, I just uploaded a car claim document and everything that we've done right now, you can see what the system actually does. I'm going to try to load. What you can see is one of our computer vision and natural language processing based classification model identified every page in the document for if it's a letter, if it's an invoice, and you can see the label here. What you can see is, this is the validation user interface and you can see what the system did on the left here.

It identified every page in the document of what it belongs to, there's a classification model that's running that combines both computer vision and natural language processing. On the right here, what you can see is everything that the model is extracted. The problem that we were trying to solve, the extraction of information from the table, and you can see where this information is getting picked up from.

It's identified, not just a table, it's identified every row, it's identified what are the elements in the row, and there's not one piece of rule that's running in the system right now. It's identified every aspect of information that we need to extract for the domain in order to do car claims handling and automation. This is something that I wanted to show to understand where we've come from and what AI systems can actually do at this point in time.

If you’d like to move to Berlin, to a cool city, and work on complex AI problems, we are hiring. We also write a lot about some of the work that we're doing in natural language processing, generative adversarial networks, and different other aspects on our engineering blog, so do check them out.

 

See more presentations with transcripts

 

Recorded at:

Jul 03, 2019

BT