InfoQ Homepage Presentations Panel: Future of Language Support for ML

Panel: Future of Language Support for ML

View Presentation

Speed:

39:38

Summary

Jendrik Jördening, Irene Dea, and Alanna Tempest take a look at the state of the art of ML/AI development and how advances in language technology (specifically differentiable programming langs) can help.

Bio

Jendrik Jördening is CTO at Nooxit. Irene Dea is a software engineer at Facebook. Alanna Tempest is a software engineer at Facebook.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Schuster: Since everybody loves ML and AI, I thought we should probably look at, what's differentiable programming and what other research topics are out there that will improve ML and AI. For this purpose, I've invited this really illustrious panel here. Maybe you can introduce yourselves and why you're here. What your connection to ML is.

Dea: I'm Irene Dea. I gave the talk on differentiable programming in Kotlin. It's about the framework that we're building at Facebook, specifically on the team, I work on the static shape checker.

Jördening: I'm Jendrik. I'm CTO of Nooxit. We're deploying a lot of machine learning into production. I originally started in data science, now I'm going more into the Ops parts.

Tempest: I'm Alanna. I work with Irene actually, on differentiable programming. I'm specifically focused on the performance of our tools, so making sure we can train goals quickly. Prior to that, I worked on a compiler team at a hardware startup for AI.

What Differentiable Programming Is, and the Difference to Other Approaches

Schuster: Maybe you can just give a quick elevator pitch of what differentiable programming is, and why it's different than other approaches.

Dea: A lot of the popular frameworks out there: JAX, TensorFlow, PyTorch, those are really great for traditional machine learning model use cases. Once you step out of those boundaries, you don't really do so well in terms of performance and usability. What we're trying to do with differentiable programming in Kotlin, is we're taking a compiler-aware approach to this problem, and we're trying to support these other use cases that are outside of the traditional machine learning realm.

Performance of Tools

Tempest: Actually, it's really interesting. We're using Kotlin. It can compile to a number of different backends, but its primary backend is the JVM. There's other projects that have worked on machine learning on the JVM, and of course, there's Spark. Machine learning on the JVM in the sense of training ResNet is not really a big thing that people do. I think a lot of people stick to Python, C++, and that stack. The interesting stuff that I get to work on is like, how do I work with garbage collection and memory management in the JVM? How do I make that work with these C++ libraries, especially the Intel ones and then also CUDA that are super optimized for running things like convolutions?

Schuster: Is it getting the data across to the native code, or is it taming the garbage collector, or stuff like that?

Tempest: It's a combination. For one thing, in the JVM, you can't get a big array of uninitialized data. You have to zero out that data because the JVM is like, we're memory safe. We're not going to give you uninitialized data because we're safe. That's a really great thing about the JVM. When you get into the performance of some of these Ops, you're like, just give me the array. I don't have time to zero, and I don't need it zeroed. Things like that.

Schuster: All the memory safety in the JVM is fighting it there.

Tempest: It's good and bad. It's something to work with. It's obviously super useful to our users, because if users had to do the level of control that one has to deal with in C++, it just gets in the way of training the model of doing what you're trying to do.

Differentiable Programming for Training

Schuster: It's mainly used for training not for inference, differentiable programming for your Kotlin library?

Tempest: I've mostly been focused on performance of training as a whole right now. I think there are other tools that are actually pretty good at inference. Training is more of like, I want to iterate quickly. I want to make a change and I want to see if my loss decreases. Versus if you're doing inference, you maybe have more time to just take what you've trained and feed it to an optimizer, and let that run for five hours before you put that in production. While we might be interested in that later, I haven't personally looked at that yet.

Tools for Building ML

Schuster: I think you were not familiar with differentiable programming before. What are you using now? Do you use Stone Age tools to build ML?

Jördening: I use pen and paper. Then I start iterating. An iteration takes me around 10 years, depending on the test size. I started with TensorFlow, and then PyTorch came out, I think five or six years ago. It's a long time ago. That was already a humongous jump in tech, because it gave you a lot of reusability, which I really liked, because you started having the modules. If someone wrote a module, or a neural network and you wanted to reuse it, you just added the class or instantiated the class, added it to your object. You were like, I can just reuse the network, and it was so easy. Under TensorFlow, basically, you started searching the index for the weights of the convolutional layer out of this humongous weight file, and you were not really happy about doing that all the time.

I really stuck with PyTorch since then. I always wanted to try TensorFlow, too. I never had the time for it and never had the urge to do it, because I really liked the integration between Python and PyTorch. I definitely want to try Kotlin at one point, because it makes my life easier when arguing with my co-founder about all the cloud bills, because Python is definitely not so efficient when it comes to resource usage. Yet, in the end, you always have this tradeoff between time it takes me to write it in Kotlin, and time to write it in Python, and then the cloud bill. Most of the time, at least in small companies, you end up with saying, my time is more valuable than the money I throw at the cloud provider for giving me one more node. I think at Facebook, that's a different story if you have the number of data scientists that Facebook has, or the number of models you try to train. Yes, not pen and paper anymore.

Dea: You also mentioned that you are really focusing on bringing models to production, are you also using PyTorch for that?

Jördening: Yes. Currently, I still use PyTorch for that. You can get it a lot faster by just going to 16-bit floating points. Most of our models are so small that that is ok from the response time. You get an HTTP request and normally the time from the person asking you the question until it reaches your server is already so long that they don't really realize that your model takes 10 milliseconds more. If you go to bigger models, you can do quantization or some other things to get it run faster, or use TensorRT, which I tried two years ago and never touched again, because it was such a pain to build everything in C++. That's why I made inferences its whole own story. You can start doing pruning and then you can make a doctoral thesis out of optimizing this one model.

Why Kotlin?

Schuster: The choice of Kotlin is one question. Irene you answered that in your talk that it's because of the compiler infrastructure, and stuff like that. I think that's the reason.

Dea: One big reason is also just the performance, and Kotlin being a JVM language. Another big one is the static compilation aspect and also the static typing.

Schuster: I think that was also one of the big arguments for Swift for TensorFlow, the static typing in Swift compared to Python and static compilation rather than having to carry around the Python interpreter and stuff like that.

ML and JVM performance, considering Off-heap Memory

In regards to the ML and JVM performance, if you considered off-heap memory?

Tempest: Which heap? Off of the JVM heap, like C++?

Schuster: Off-heap in the sense of not on the Java object, but like use unsafe or something like that, other kind of heap buffer?

Tempest: That has been on the to-do list actually, for a little while. That's what's happening for the prototype GPU support that I have right now. The memory has to live on the GPU, so we are doing that. What is meant is wrapping in float buffers to expose to Java. I haven't tried that. That is another thing to try.

Hardware and GPUs When Writing Code in Kotlin

Schuster: Talking about GPUs, what's the story with hardware, if you write your code in Kotlin, how much can you offload to fancy pants hardware?

Tempest: It depends on the libraries available for the hardware. Basically anything we can connect to from C++, we can send down through Kotlin. With cuDNN, CUDA on GPUs, it's relatively straightforward because you can run an Op at a time on the GPU. The part where it gets challenging is the world of accelerators, where if accelerators aren't expecting the same offset, or if the cost to offload to the accelerator is high, then you're going to need to send down a group of operations at once. That's something that we hopefully plan to support in the future. I would like to support that in the future. We'll have to see how that goes.

Kotlin Implementation and ONNX Compliance

Jördening: Is the Kotlin implementation actually already ONNX compliant? I think three years ago, ONNX started and was like, we can finally start moving our AI models from A to B to C, and nobody had to worry. Then it disappeared and nobody ever talked again about it.

Tempest: I think at one point I played with ONNX. Yes, I haven't seen much activity around it recently.

Dea: I think the ONNX stuff was really big when we're doing a lot of exploratory stuff with our project. Then, yes, same experience where it just disappeared. No one talked about it ever again.

Schuster: ONNX is the interchange standard for models, I think, is it?

Tempest: It tried to be.

Dea: It tried to be an IR for models so that you could take something in TensorFlow, then have it spit out something in PyTorch or something like that.

Jördening: Or TensorRT

Kotlin and Native Compilation Support

Schuster: Actually, talking about native code or JVM, so currently Kotlin also has native compilation support. Do you run on all of those, or is it just the JVM?

Dea: I think we're focusing mostly on the JVM. It's true that Kotlin can also target JavaScript and native, which is great, because that means they can really be heavily supported on both mobile platforms, and also web programming. Currently, I think the main use case of Kotlin native is really to have different types of mobile support. I don't think the purpose of Kotlin native is necessarily for performance at the moment. We haven't really looked into that.

Kotlin on Android

Shuster: How does Kotlin work on Android? It's compiled to Dalvik?

Dea: Yes. It uses the JVM backend to be used on Android.

Kotlin Pros and Cons

Schuster: Kotlin pros and cons. I think we just talked about different backends and stuff like that. I feel there's a story.

Tempest: I actually love using Kotlin. I came from the Python, C++ world, and I was a big fan of both. I was afraid of the JVM. The experience of writing Kotlin is very nice, in my opinion. I like having the type system but it's not a particularly restrictive type system. In the sense of like, you don't have to declare a type on a variable, you just have to declare it on a function, which Python is getting towards too with its type hints. It's personally a really awesome experience. I like it. I think Python could be faster for quick stuff for me when I can keep in my mind all of the argument and return types to all the functions in my code, then Python is perfect. I think it is faster for me. The documentation that I get from the types and especially from the Tensor typing, Irene's project. I'm her biggest fan of that. It's a pretty awesome experience, in my opinion.

Jördening: We're very heavy on the Python stack. One reason is of course the ecosystem for ML/AI is simply so humongous in Python. We indeed use all the type hints in Python, and I totally agree that if possible, use that, because I think it's nicer when they're a first class citizen. It makes it easier for people to get into a language if you don't have to start with typing and then later explain them, there's a thing you can use to basically check a lot of things which will otherwise lead to bugs. I went from Python to Java to Python, and not to Kotlin yet because I just stuck in the Python world. Because I'm always thinking, I have Docker, what do I need a JVM for? It's all virtual anyhow. That's maybe very particular to us developing our services. We use lots of Golang for microservices as well. Because it's simply super-fast for small microservices, but for everything that's ML/AI related. We're faster in developing in Python, and that's basically our pro for using Python. I think if you're used to writing Kotlin, and someone has an amazing Kotlin library for ML, definitely go for that. In the beginning, go with whatever is most comfortable for you to start with.

Dea: I would really say though that as someone who's come from also like a lot of Python, I do think Kotlin is actually a great starting language. I wouldn't even say that it's much heavier than Python. It's definitely much lighter than Java. There's an interpreter as well so you can script stuff pretty quickly. Python without types, I cannot handle anymore. Writing code without types, I have no idea what's going on. To that point, also when I look at model code in Python that doesn't have any comments about the shapes, that drives me nuts. I have no idea what's going on. If you go on Stack Overflow, which I've done many times and look up something like Tensor shape mismatch, you will see so many errors from people who are just trying to use the basic tutorial, like conv2d, just a few layers like thing on a new dataset. They just have all these issues because they're using it on a new dataset, and maybe two of their dimensions are flipped, and that's it. They run into all these issues, and they have no idea what's going on. I feel like from that perspective, that's where types and static shape checking are really beneficial.

Then, also, I think as people are writing bigger ML programs, I feel like doing little scripty stuff in Python is great, but when people are writing bigger things, I think that's where having a statically typed language with static shape information, and great IDE support is going to be really crucial. Especially when you're collaborating with people, and you're sharing code, I think that's where it gets really important.

Tempest: I totally forgot about the IDE. I want to jump on that. Kotlin comes from JetBrains. JetBrains makes IDEs. Kotlin is designed to be awesome in the IDE. It is. It's like when you start writing code in IntelliJ, it gives you all these suggestions. It tells you how to write canonical code. I came from a world of Python with not that many suggestions and you just learn. You learn your PEP 8. You learn your rules. It's actually really nice to have that. Be taught as you're learning what to do there.

Dea: I was just going to add to the IDE point. I think also because the IDE is also hackable, you can also add suggestions that are more domain specific. We have IDE support for static shape checking and we can add IDE support for other things as well.

Jördening: I have VS Code setup with auto formatting using the Black formatter. I have my Py type checking and automatic PEP 8 testing. I'm super fine with staying with Python. The advantage of VS Code, what I prefer over the JetBrains one is that you can have multiple languages in one IDE. There's always this fight if you're a person writing one language you just have one language, but if you write Go, Python, Helm Charts, then Terraform code, you're just like, I just want all of that in my IDE. Don't bother me. If it loads for 10 seconds, that's fine. Then it works. If you're focused on the one language, I totally agree. There's nothing nicer than a nicely set up IDE. Then you don't have to share your settings file with half your company, which is what I'm currently doing.

Static Shape Checking

I actually had one question regarding the static shape checking. Especially for semantic segmentation, you can somewhat reuse the network for different image sizes. Does that currently work or is it very static?

Tempest: Generics.

Dea: We have polymorphism. We do support polymorphism with shapes. You could say maybe your inputs is N and your outputs is M, and an M gets assigned at the call site when you actually use it.

Go for ML Workloads

Schuster: Do you use Go for actual ML workloads or for other stuff?

Jördening: No, we use it for microservices, especially for stuff that needs to get super small. The memory footprint of Go is amazing. Especially if you run in the cloud, you can just put 100 Go containers on one node, while having 2 Python nodes just explodes it. We don't use it for ML, simply because the Python ecosystem is there. I think in Go, you would need to rebuild a lot of it. That's why we simply use Python. That's, I think, generally one of those advantages of microservices, it's like, I will definitely try the Kotlin ML stuff at one point. I'll be like, I just put it alongside the Python stuff and alongside the Go microservices. I'm generally interested to see how many languages will support ML, because it makes sense if you have monolithic applications, where you basically have to stay in your language. With Kotlin, you can basically integrate C++. With Python, you can integrate everything. The question is just how long it takes to call in runtime. Yes, I'll just be interested to see how many different languages will support it, and actually where ONNX goes, because I think it was a great idea. If we have all the frameworks, we basically want to move models from A to B to C, this cross language support will be, I think, very interesting as well. We're not using Go for ML currently, at least. Let's see what comes in the future.

Tempest: Regarding your comment about ONNX, it'll be also interesting to see how models will evolve. A model that we've talked about on our end, which I think is pretty cool is the SLIDE model, which uses Hash Tables and a smart hashing scheme to drastically sparsify a densely connected network. I don't think ONNX is going to be able to rip that anytime soon, or maybe it will. It seems like there's a direction of ML research that's just exploding in the kinds of things you can do with architectures. It'll be really interesting to see what happens there. That's what differentiable programming is excited about.

ONNX and the Interchange Model

Schuster: With ONNX, you mentioned that there's things you can't really represent in ONNX. Does the interchange model make sense, or will it have to wait for another iteration of research to know what it has to support?

Jördening: I think it's simply not so needed by the community right now, because basically, I think you need to convert every layer in every language to the respective layer in every other language. That's, I think, simply a humongous amount of effort. That's what is limiting it. I think, for dense layers, it has no problems. Convolutions are no problems. The first thing is already I think striding, it's like, where do you start counting the stride? It's different between TensorFlow and PyTorch. They start shuffling around the dimensions. What was the direction? I think PyTorch prefers batch second, when it comes to time series, while TensorFlow likes batch first. Which makes sense because I think PyTorch is more tailored to CUDA, which likes it, batch second. That's where shape checking actually helps you because then you know which instance you run around.

It's not that needed yet. Especially if we see models appearing, and especially new research papers appearing in other languages than Python, it will get really relevant. Because then if you basically want to reproduce the paper, you somewhat need a way to transport the model around. Especially if you want to then start iterating on it. I hope it will reappear. It's a humongous effort, I think. That's why it probably died down a bit, because then people realized how complex it will be.

Kotlin's Fit for Web Apps, ML, and Mobile

Schuster: Wondering if Kotlin is fit for a web application or ML, or mobile, or all of these? It's for Android, so people are using it there. Web application, I don't know if Kotlin for JavaScript is that mature enough or not.

Dea: I've tried it actually. It's pretty good. I haven't tried it for ML specifically. JavaScript, or Kotlin for JavaScript is actually quite easy to use. I originally wrote this program, maybe like 700 lines in Kotlin for JVM, and just flipped the switch for JavaScript. Wrote a little thing to actually display stuff on web and it worked. I didn't have to change any of the meat of the code. It's pretty good. In terms of speed, I'm not sure if it was very performant. I think it's great that I was able to use the same code. I think that's one of their main goals is code reuse for having these different backends.

What Is Probabilistic Programming?

Schuster: Just before we started the panel, I threw out the word probabilistic programming, because we all like to introduce new paradigms. Who would like to do an elevator pitch or elevator explanation for probabilistic programming?

Dea: Probabilistic programming basically aims to allow people to add a level of uncertainty to their models and to show uncertain statistical relationships. For example, what you do with probabilistic programming is you generally first give some assumption about your world. You might have like some distributions, and then you provide a set of observations. You have, this is how I think the world functions. Then you have, this is what I see happening in this world. Then what comes out of it is a prediction of how your world actually is, given those assumptions and your observations. That's my high level overview. It's a great way to add uncertainty because the world is just full of uncertainty. You have these underlying relationships, these underlying effects and causes to the data that you actually see. That's why it's important.

Probabilistic Programming in the ML Space

Schuster: Is this in the ML space? Is this an alternative to the various networks and models or is it in addition to those?

Dea: I think it's separate. It's like an added information to your models.

Tempest: Because it requires the addition of the assumptions. If you're just working on your machine learning model, then you just have your data and you're just going to throw it at which architecture you've picked, or a couple of them, versus programming in the assumptions is a big input. Because in a way you're restricting the model, is my understanding. This might be a little off, but you're restricting the model in a way based on what you think the world looks like.

Jördening: I think where you can actually add it together is basically when you make probabilistic predictions. We, for example, used that when we did power predictions when I worked at the company. We wanted to predict how much power we consume, to basically then go to the stock market and buy the amount we need. What you want to know is not only the answer, it's like you will use 50 megawatts, but you want to know, is it within plus and minus 10 megawatts, or are you uncertain from 100 to zero, because that amount is different to what you will actually buy. The nice thing you can then as well see is if your input data goes out of the training distribution, you can basically as well see how your outputs increases in uncertainty. We actually modeled that with power outages we had. Basically, you could see in advance that some sensor data went crazy, and then you could just see that the model got more uncertain how much power we will actually use. It's helpful for that. It's basically nice to know in advance how certain your model is itself about the prediction it does. For that, it's pretty handy, I think.

Different Languages for Inference

Schuster: We have a quick question about using Rust, I think, for inference. Does the language in that sense make that much of a difference for inference?

Tempest: Different languages make a difference for inference. I think basically all models developed in Python-based frameworks are exported to C++ for inference, so certainly potential for a difference there. I'm not sure how big the difference would be, but certainly potential.

Jördening: It is reasonably big. At least going from Python to C++ accelerates stuff, but normally the first things you do is go into float16 or quantize your model. Going from float32 to Integer8 stuff actually gives you at least a 4 time increase, maybe even more depending on how optimized your hardware is. At least for me, that was the gain I needed. Then I didn't bother writing it in Rust, but I did in C++. Probably you can go even faster, in that it depends a bit on how good your compiler is as well for the language. I'm not really a compiler person, but I think there are differences in how well they optimize.

See more presentations with transcripts

Recorded at:

Mar 04, 2022

InfoQ Software Architects' Newsletter