Ludwig is a code-free deep learning toolbox originally created and open sourced by UberAI. Today, on the podcast, the creator of Ludwig Piero Molino and Wes Reisz discuss the project. The two talk about how the project works, its strengths, it’s roadmap, and how it’s being used by companies inside (and outside) of Uber. They wrap by discussing the path ahead for Ludwig and how you can get involved with the project.
Key Takeaways
- Uber AI is the research and platform team for everything AI at the company with the exception of self-driving cars. Self-driving cars are left to Uber ATG.
- Ludwig allows you to specify a Tensorflow model in a declarative format that focuses on your inputs and outputs. Ludwig then builds a model that can deal with those types of inputs and outputs without a developer explicitly specifying how that is done.
- Because of Ludwig’s datatype abstraction for inputs and outputs, there is a huge range of applications that can be created. For example, an input could be text and output could be a category. In this case, Ludwig will create a text classifier. An image and text input (such as a question: “Is there a dog in this image”) would output a question answering system. There are many combinations that are possible with Ludwig.
- Uber is using Ludwig for text classification for customer support.
- Datatypes can be extended easily with Ludwig for custom use cases.
- Ludwig would love to have people contribute to the project. There are simple feature requests that are just not prioritized with the current contributor workload. It’s a great place to get involved with machine learning and gain experience with the project.
Subscribe on:
Show Notes
What’s the mission of Uber AI? —
- 01:50 Uber AI is the research and platform team for AI in the company.
- 02:00 The only thing we don’t do is self-driving cars; that’s a different division, ATG.
- 02:15 There are five teams in Uber AI.
- 02:20 One is a core research team, whose goal is to write papers, present at conferences and publishing in the research community.
- 02:30 Then there is a connections team, who work on applied AI with other teams in Uber, how to run an experiment, parameterise models and so on.
- 03:00 The other three teams are specific areas of machine learning; conversational AI is building a platform for having conversations so that other teams can use it.
- 03:20 There’s a computer vision team, and a sensing and perception who works on sensor data and particularly how to learn in an unsupervised way.
What is Ludwig as a code-free toolbox on top of TensorFlow?
- 04:00 The main point is that you can specify a TensorFlow model in a form that is much more digestible, as it is just configuration.
- 04:15 It’s a declarative format where you specify what are the inputs and what are the outputs, and from that Ludwig builds a TensorFlow model that handles those inputs and outputs.
- 04:35 Once Ludwig has built the model, it can train and produce predictions.
- 04:40 Although it can be used in a code-free manner, it doesn’t mean that you cannot use it with code - it’s really extensible and has a programmatic API to chain models.
How deep can you go with Ludwig?
- 05:10 The main strength is an abstraction based on data types.
- 05:20 In theory there’s a limitless number of things that Ludwig can do, because you can specify the data type of your inputs and the data type of your outputs.
- 05:25 Depending on the combination of those data types, all sorts of combinations can be done.
- 05:30 For instance, you could have an input image and an output category, from which you can build an image classifier.
- 05:40 If your input is text and the output is a category then you have a text classifier.
- 05:50 If you have an input image and text question, and the output is text, then you could have a system where you can answer questions about the image.
- 06:10 The combination of data input types and data output types makes for an infinite amount of applications.
- 06:20 It’s also a weakness to an extent, because it means that in order to provide the specific functionality, you must provide that specific data type for the users.
- 06:30 At the moment, Ludwig’s main data types include text, images, time series, sequences, categories, binary values, and numerical values.
- 06:40 However, at the moment, it doesn’t support speech, so you cannot make a speech categoriser at the moment.
- 06:45 There is nothing stopping us from having it; it’s just a matter of time.
How would you build a text summarisation system with Ludwig?
- 07:25 Text summarisation is usually one of two types: abstractive and extractive.
- 07:30 Extractive systems usually select a piece of text that are going to end up in a summary.
- 07:35 The abstractive systems invent new text which isn’t necessarily in the original text.
- 07:45 As an example, if you want to build an extractive system is to have your input CSV with two columns: one is the input, and the other is a bitmask of which words or sentences to include.
- 08:15 What you can do with Ludwig is you can specify the CSV as the input data, and use a model definition which contains one input feature of type text, and one output feature of type sequence.
- 08:35 Ludwig will then learn that given that input text, it will learn the output of ones and zeros of the bitmask.
- 08:40 New text can be used with that system to generate a new bitmap for selecting which parts of the text are extracted.
- 08:50 The only caveat is that you have to specify the output sequence to be a tagger.
- 09:05 In Ludwig, both inputs and outputs can have different encoders and decoders.
- 09:15 For text, you can have one encoder which is an STM or a CNN or a transformer.
- 09:25 The user can select them by choosing the encoder by name in the configuration, and the same applies for the output.
- 09:30 For sequential outputs, there’s a generator, which generates text token by token, and for the tagger, it takes all the inputs and classifies them.
- 09:50 For the text extraction case, it would classify each single element of the input (sentence or token) to be one or sequence.
- 10:00 The length of the input sequence and the output sequence are perfectly aligned.
How much data do you need for training?
- 10:10 That depends entirely on the task; there are some rules of thumb you can follow.
- 10:20 If you have a classification problem, the rule of thumb is you need to have at least 1000 data points for each class.
- 10:25 You could do well with less if your classes are pretty well separated from each other.
What are some of the use-cases that Ludwig is being used for?
- 10:50 The project was a customer support model called COTA - there’s a [blog post][https://eng.uber.com/cota/] and [paper][https://eng.uber.com/research/cota-improving-the-speed-and-accuracy-of-customer-support-through-ranking-and-deep-networks/] if you’re interested.
- 11:00 We wanted to provide to customer representatives a suggestion of the classification of each ticket that was coming in, and what the best template is for answering the ticket.
- 11:20 In the beginning I built two models for each of those tasks.
- 11:30 Then new features came in, related to specific trips - for instance, in the beginning we only had text that the user was sending us, then we had additional information about the trip that the user has taken; how long, how much, whether it was completed or not.
- 11:50 Instead of making it as a one-off thing, I made it so that you could add categorical features, numerical features and so on.
- 12:00 The same model could be used for doing the same task: if you know what kind of ticket is, you have a better understanding of how to answer it.
- 12:10 Ludwig is the generalisation of this package that I built for this problem.
- 12:25 Now it has been used for many other things; we’re using it for conversational AI for language understanding and generation.
- 12:35 It is also being used for matching questions against stack overflow and in our chats.
- 12:50 There’s a bunch of other use cases that it is being used for; in Uber Eats, it’s being used to predict expected time of delivery.
- 13:00 There are new products being built on Ludwig, including computer vision, for Uber Eats, for problems in Uber marketplace.
- 13:20 For external users that I’m aware of, Ludwig is being used at Apple in the stack of tools for data scientists, and by a couple of startups, and for analysing musical lyrics and assigning labels such as sentiment or other measures of subjective feelings.
- 14:05 I know it’s being used for forecasting tasks and prediction of stock markets and sport - in particular for hockey and basketball.
What type of accuracy are you seeing with Ludwig?
- 14:35 There is nothing specific in Ludwig which makes models good or bad.
- 14:45 Ludwig is more of a structure that allows models to be used.
- 14:50 It depends on the models and the data used to train those models.
- 15:00 All my experiments so far - for the models you can train on open data sets - Ludwig is always in the ballpark within a 1% margin depending on the tasks.
- 15:20 You could get a bit more accuracy if you spent a long time custom building your model, but to get there it will take you a long time - but with Ludwig you don’t have to do anything to get a good model.
- 15:30 If you have a custom model that gets that amount of performance, you could contribute it back and have it as the options that Ludwig has available.
Does Ludwig build upon TensorFlow or does it enable other algorithm transparency?
- 16:25 Deep learning models are a black box, and there are some projects like Lime and Chap that allow you to do local linear approximation of the learning process for specific data points.
- 16:40 There is nothing specific in Ludwig which supports that: the predictions of the model will be beneficial.
- 16:50 What is already there are visualisations of predictions that the model makes and also the quality of those predictions.
- 17:00 When you train a Ludwig model, you’re actually training a TensorFlow model - along with the Tensor board as well as metrics that are tracked during training.
- 17:15 On top of that, Ludwig provides 15 or so visualisations which take the outputs of the training or prediction process.
- 17:35 You can compare multiple models on different metrics, compare how the models are calibrated, impose thresholds on the confidence of the model and see how many of the data points you can obtain predictions with a certain amount of accuracy.
- 18:00 You can also compare predictions of your models against a different data set, whether you get predictions right or wrong.
- 18:15 In general all these visualisations give you more understanding and confidence of the model predictions, even if you cannot see what is going on in the inside of the model.
How is Ludwig extensible?
- 18:35 Ludwig is open source, so you can obtain and change the code.
- 18:45 We tried to make it easy to extend it in two ways; we wrote a developer’s guide that will give some help to extending Ludwig.
- 19:00 First, given a specific data type (like text), it’s easy to add additional encoders for text.
- 19:20 All you have to do is conform to a really lightweight interface.
- 19:30 You can specify the parameters of your model as input parameters, which gives you a tensor in a specific shape.
- 19:40 Whatever piece of code goes inside the interface can transform it however you want.
- 19:55 That’s why we have RNN -> CNN transformers, because they go inside this interface.
- 20:00 If you want a different model, it’s really easy to add an additional model to conform to that interface.
- 20:10 Another way to extend Ludwig is to add additional data types.
- 20:15 This is a bit more complicated, but still relatively simple.
- 20:20 We provide one class for input features, and one class for output features.
- 20:25 There are a bunch of interfaces that the developer has to implement.
- 20:35 One interface collects max data from row data to tensors.
- 20:40 Another interface says how to compute the loss for this specific data type, and another is how to compute predictions for this data type.
- 20:50 These are things that you would have to implement anyway in a self-built system, and you just have to find a way to organise them.
- 21:00 Once you provide these functions, the infrastructure that Ludwig provides regarding training, looping, model construction etc. comes for free.
- 21:10 It’s great for companies that don’t want to create technical debt.
- 21:15 If you want to create a speech recogniser, you can add it to Ludwig, and then anywhere you have a speech problem that you want to train a model on, you can use Ludwig for free.
- 21:30 It’s a way around technical debt and code reuse.
What does the community around Ludwig look like now?
- 21:40 There are two main contributors other than myself; one is Yarick and the other is Sai Miryala, both of whom work for Uber.
- 21:50 Sai did most of the work on testing, and Yarick helped a lot with some of the input features and organisation of the code.
- 22:15 There’s been a number of contributions from outside, like Comet.ML who contributed integration with their platform.
- 22:30 When you train Ludwig models, you can keep track of the training models and statistics on Comet.ML which is pretty handy.
- 22:40 There has been many contributions on the documentation.
- 22:50 Code-wise, people have contributed many bugfixes - but not so many new features, because it’s a new project and people have to get used to it first.
What does it look like to on-board new contributors?
- 23:30 We have a number of feature requests on GitHub and some of those are pretty simple.
- 23:40 The reason we haven’t addressed them yet is purely a matter of prioritisation.
- 23:45 Starting with Ludwig could be solved for some of these feature requests without having to understand all the fine detail of the codebase.
- 24:00 We provide support and if people are interested, they just have to reach out to me, Sai or Yarick and we can help you with some of those simpler issues.
- 24:15 I have an example: I went to SOFIA for the Uber summit, and one engineer reached out to me, and they are adding to the visualisation side inside the Jypter notebooks.
- 24:50 I would encourage people to to get involved.
- 25:00 The code is at https://github.com/uber/ludwig, and there is documentation at https://ludwig.ai
What’s on the roadmap for Ludwig?
- 25:35 There’s a lot of ideas and new features, but there are two main features that we’re focussing on for the next release.
- 25:45 One is new encoders and data types - there are relatively new state-of-the-art models which we want to have them for the users to select.
- 26:05 The same applies for images and other types.
- 26:15 We also want to have support for other data types - for example, speech, potentially videos, point clouds, graphs and trees.
- 26:30 The other big thing we’re working on is scaling up Ludwig so it’s easy to work with really big data sets, that are on distributed data sources like HDFS or S3.
- 26:45 The strength and limitation of Ludwig is that it works on CSV files at the moment.
- 26:55 It’s a strength because it’s really easy to use for people if they have CSV already.
- 27:05 If you have a big amount of data that doesn’t fit on a laptop, you can’t do that with Ludwig at the moment.
- 27:15 We’re working with this integration with Petastorm [https://github.com/uber/petastorm] which is an open-source project from Uber that allows you to interact and ingest data from HDFS, S3 etc.
- 27:30 That will allow training Ludwig on potentially petabytes of data.
From time to time InfoQ publishes trend reports on the key topics we’re following, including a recent one on DevOps and Cloud. So if you are curious about how we see that state of adoption for topics like Kubernetes, Chaos Engineering, or AIOps point a browser to http://infoq.link/devops-trends-2019.