InfoQ Homepage Podcasts James Clark on How Ballerina Handles Network Interaction, Data, and Concurrency

James Clark on How Ballerina Handles Network Interaction, Data, and Concurrency

Nov 29, 2021

Charles Humble discusses the design of the Ballerina programming language with its lead designer James Clark. They discuss how the goals of the language inform a number of design choices including: the type system, error handling, the concurrency model, and the language’s built in support for visualization of program flows.

Key Takeaways

Ballerina is designed to be used for real-world cloud native applications with an initial focus on enterprise integration. It is designed to produce programs that are easy to maintain, favoring this over speed of initial creation.
The language is not object orientated. It uses a structural rather than nominal type system, and explicit error handling without the use of exceptions.
It is intended to make dealing with network interactions, data, and concurrency straightforward.
The concurrency model includes support for lightweight virtual threads and named workers. It supports a graphical representation of the Ballerina syntax tree in the form of sequence diagrams, meaning that there's no abstraction between the visualization and the code that runs on the platform.

Subscribe on:

Introductions

Hello, and welcome to The InfoQ Podcast. I'm Charles Humble, one of the co-hosts of the show, and editor-in-chief at cloud native consultancy firm Container Solutions. My guest this week is James Clark. James has been contributing to the open source community for nearly 20 years and is probably best known as a pioneer of SGML and XML. More recently, he's been working for WSO2 which is an open source technology provider founded in 2005, where he is the lead designer for the Ballerina language. Ballerina is an open source programming language for the cloud that makes it easier to use, combine and create network services. It is, I think, a really interesting language that joins others, like Go and Rust and Dart, as being languages that have been developed for this cloud native era, and so it's Ballerin, that will be the focus for this podcast. James, welcome to The InfoQ Podcast.

James Clark: Hello, thank you for having me. I think it's actually nearly 30 years I've been doing this sort of thing, actually. I'm getting on a bit now.

Charles Humble: Is it 30?

James Clark: Yeah, I think it's nearly 30 now.

How has the shift to cloud changed programming? [01:20]

Charles Humble: Nearly 30 years, excellent. Well, there we are. So, I said in my introduction that Ballerina's been conceived during this new cloud native era. So, how has the shift to cloud changed programming in your view?

James Clark: I think it changes, in a fairly big way, what are the main tasks that a program do? I started programming in the pre-cloud era and you think about how a program get things done in that era, you were having Perl and it was, you read files, you write files and you have APIs, but the APIs are calls to libraries. They're called to maybe shared libraries or C libraries, they're all on the same machine. So, APIs are library APIs and you get stuff done by writing files basically. Whereas in the cloud area, you get stuff done by consuming and providing network services and the APIs that matter are primarily network APIs. So, they're sending network messages, typically HTTP, typically JSON. So, what an API is and what the main business of our program is very different.

I guess another aspect is concurrency in the traditional C world, if you like, most application programs don't really need to worry about it. The operation system does, but the application program can just forget about it. Whereas in the cloud, it's pretty pervasive. You've really got to think about it. You can't completely hide concurrency from the application programmer.

What are the specific goals you have for Ballerina? [02:42]

Charles Humble: And then given that context, what is the specific goal or goals you have for Ballerina? What is it that you're trying to accomplish with the language?

James Clark: This isn't an academic exercise. So, it's very much designed to be a pragmatic language and it's not designed ... we don't want it to be a niche language. We have ambitious goals. We want it to be something that is capable of being a mainstream language. So, I guess the initial target is we want it to be a good way to do enterprise integration and you can start off with a narrow objective, which is we want it to be a good way to do enterprise integration. Good compared to the traditional way, which is an ESB plus some Java. So, basically the basic concept is okay, instead of having a DSL, typically a DSL, with XML syntax that has networky stuff that's all about providing endpoints and messages and routing. Instead of having that plus Java, let's just have one unified language that you can do it all in. You don't have to have this two language, you don't have to have this split, and you could just do it all in one unified way and in a way that works, is a good fit, for the needs of the cloud.

It feels very much like a language that's been designed to favor ease of maintainability over speed of initial implementation. [03:56]

Charles Humble: When I first started looking at the language, one of the things I was struck by was that everything is very explicit. So, for example, there are no implicit conversions between integers and floating point values. An integer overflow causes a runtime exception and so on. It feels very much like a language that's been designed to favor ease of maintainability over speed of initial implementation, initial typing. Would you agree with that? Is that a fair assessment?

James Clark: Absolutely. It's a kind of, I mean, if I say enterprisey mindset, not everybody would understand that in a positive way. I mean, programs are read, I mean, real programs, that businesses rely, on serious programs, they have a long lifespan. You've got that crafty COBOL code that's 20 or 30 years old. They're read for years and decades, where being able to read it and understand it is far more important than being able to save a few key strikes when you're typing it up.

So, it's definitely a fundamental design goal of Ballerina to favor maintainability, favor explicitness, avoid surprises, and also to leverage familiarity and leverage what people know. The percentage of working programmers, who know one of JavaScript C, C#, Java, C++ is pretty high. So we want to leverage that so that if you know one of these languages and you look at a chunk of Ballerina code, you should have a pretty good idea what's going on. I may not know every little tell of the semantics of the language, but you can look at it and even knowing zero Ballerina, you should have a reasonable sense of what's going on. It should not be mysterious.

You don't use exceptions? [05:30]

Charles Humble: Right, and that shows up, for example, in the way you do error handling, you don't use the exception method, which is common in languages like Java.

James Clark: I think there's a significant trend in modern program languages, away from exceptions towards more explicit error handling, where you return error values. So, you see that in Go and in Rust, explicit error handling is explicit. You do have exceptional flow, which are panics, but those are for the really, truly exceptional case, but a regular error handling, and then it's a normal part of network programming to have errors, that is dealt with explicitly. You see the control flow explicitly in the program, just like regular control flow. That again is a thing that's a little bit more inconvenient to write, but the poor old maintenance programmer who's coming to look at it can see what are the possible control flows and is going to have a much easier time not screwing up when they try and fix bugs,

Charles Humble: Right, yes. I found this really interesting because most of my professional programming was done in Java. Obviously Java has its concept of checked exceptions, which I think were trying to solve the same problem, but because they allow you to throw a runtime exception, most people do that. So the checked exceptions didn't really work well but I think it's trying to do the same sort of job.

James Clark: Yes, and the check exceptions are not quite the same. It shows you which are the possible exceptions, but it's still, you don't know when you call a function, when you see a function call there, you don't see in the call that there is the possibility of it at throwing exceptions. You see that there is a possibility of checked exceptions, but it's not explicit the syntax. So every time you see a function call, you got to go, "Oh, what are the possible exceptions that can throw, and how is this going to affect the flow of control in the function?" That just makes it harder to figure out what's going on.

What is it that makes Ballerina distinct? [07:14]

Charles Humble: Now you've said that Ballerina isn't a research language, it's intended to be used in industry for real-world applications. And given that you haven't invented any particularly new ideas from a language point of view, so what is it that makes it distinct?

James Clark: I mean, there are several different dimensions, but we're starting off from the proposition that what we want to do is make the things that a program has to do today easy. So, consume provide network services, work with data, we want to make those easy.

That's one dimension. Another dimension is that we are trying to provide an alternative to a combination of a DSL and Java. One of the things you can do with the DSL is you can have a graphical view and the graphical view that you get from the DSL, isn't just the syntax, it's not just a syntactic view, it is actually showing you meaningful things about the flow of messages that is possible within your application.

James Clark: So, one of the goals is that you should be able to take a Ballerina program and from that, have a graphical view that is not just syntactic, it's not just giving you the classes or the functions. It is actually giving you real insight into how your application is interacting with the network. It does that by leveraging sequence diagrams.

I think part of WSO2's experience doing enterprise integration for 15 years is that when you talk to customers and they want to explain how is this application supposed to be working? What they do is they sit down, they write a little sequence diagram, draw a little sequence track, and that gets everybody on the same page.

So, one of the key features in Ballerina is that you can click on it and you can look at every function as a sequence diagram, and you can see the flow of messages in that function. That deeply affects the syntax and it deeply affects the concurrency model, and that's something you couldn't graft onto any other program. So, when you get this, it's a two-way model, so you can edit the sequence diagram, or you can edit the code and they're two views of the same thing. You can always think of as an alternative syntax for the high level layers of the language.

The graphical representation is the actual Ballerina syntax tree [09:20]

Charles Humble: Right, yes. That I think, is really interesting, and it is something that I think is unique. So, the graphical representation is a graphical representation of the actual Ballerina syntax tree, meaning that there's no abstraction, there's no translation between the visualization and the code that runs on the platform. If you edit the diagram, you're editing the code and vice versa. So, you can genuinely roundtrip between low code and normal code as it were.

James Clark: Exactly. There's no possibility of them getting out of sync, as we are now.

Just for listeners, we are on a different continent, so there's a little bit of a lag. But there's no possibility with these two views of that getting out of sync and that comes from having thought about this. You couldn't do this just by drawing a pretty layer on top of another language. It's designed in and it's designed into the concurrency model. The way we do concurrency is it's somewhat limited and we know we need fancier stuff for the cases, it doesn't handle. For cases it does handle, it provides a more controlled environment, you can really see what's going on and there's a much easier way to do things.

How does a network API differ from a traditional object-oriented API? [10:22]

James Clark: So, that's possibly a unique thing, but I think there's deeper stuff too, which goes back to the point about APIs. What is an API in a cloud era language? A cloud era language, it's a network API. So, how does a network API differ from a traditional object-oriented API?

Well, I think one of the big difference is that you want to do more in each network roundtrip, roundtrips are expensive. So, you have typically, the parameters to your network APIs are often complex structures. So, you're sending data and whereas your object-oriented, maybe I might be set this, set that, and it's a bouillon orient or something. You don't do your network APIs like that and instead you want to send in each of your API calls, it's a complex structure. Typically JSON, which has a deep nest, typically a tree structure. That's one thing.

Another thing is that the different parts of your service are going to be in different languages. One of the beautiful things about microservices is you're not constrained to use one language for everything. You don't have to have a common API or anything. Each bit of your system can be written in whatever language is best suited for that bit. But that means that your messages, your data, the parameters if you like to your API, you want those to be data. You want them to be pure data. You don't want them to be objects. You want to send around chunks of JSON that can be interpreted by whatever language each microservice is using. So, that means a lot of what you want to do in an integration language is deal with these data. It's not objects, it's just data. You want to take the message you got from one, and you want to transform it a bit, and you want to send it off to somewhere else, or you want to combine two of them together into another message and send it off to several different people. But instead it's not the object oriented.

The object-oriented way is about combining code and data into objects and keeping the data hidden within the. Opposite of what you want to do when you're dealing with network APIs, you want to expose the data. Well, it may be hidden within your servers, but you're exposing the data in the messages that you're sending, it's very much exposed.

What is plain data? [12:24]

Charles Humble: Right, yes. I mean, you have this concept of plain data in ballerina. I was trying to think, I'm not quite sure where that term originates, but could you make you try and give us a bit of a definition of what plain data actually is in Ballerina?

James Clark: It's a term that actually comes from, I think it's a C++ term, POD, plain old data. It's just data that doesn't have any methods attached to it. It doesn't make any assumptions as to how it's going to be processed. It's programming language independent, and it's therefore inherently mobile. You can serialize it, you can copy it. It's just data. If you're going to try sending functions around, that's not so easy.

Charles Humble: And then once you've got your plain data, you can presumably do things like deep copy, deep equality, serialization, de-serialization and so forth, right?

James Clark: All that happens for free. Also you can serialize it as JSON, you don't to pre-agree ... I mean, if you try to start serializing objects, then you need to agree with the recipient about what are the objects you going to send. You have to agree on what you're going to call them, all that sort of stuff. But plain data is just much more flexible and has much less coupling and allows your services to be much more loosely coupled. Doesn't create coupling between your services.

It's a statically typed language, but the type system is primarily structural, as distinct from nominal, right? [13:34]

Charles Humble: And I think that's kind of reflected in your type system as well. So, it's a statically typed language, but you've got looser coupling than some other statically-typed languages, including things like Java that people are probably familiar with. You have built-in support for JSON and XML. But as I say, the type system is, it's primarily structural, as distinct from nominal, right?

James Clark: Yes, it's a structural type system. It works in some ways a little bit more like a schema language. You can almost think of the type system it's really doing double duty, the type system. We are both using it to constrain how, or to check how, the operations have been done within the program. But we're also using it to describe the network interfaces to the program. So you can take, when you write Ballerina types, those can also be used to generate schemas for the network interface. So you can generate graph drill schema, or open API from the types. You write the types once and those types are used both to generate that schema and also to manipulate, just like a regular type system within the program.

I think one of the things that makes life very difficult for a modern programmer is you have to continually switch between different worlds. They've got to be a bit of HTML, a bit of SQL, they've got all these different things and they have to manually switch gears being, "Okay, this is how it works in GraphQL, this is how it works in SQL. This is how it works in my languages type system and deal with the various infinite matches between them themselves. Whereas in Ballerina, you can just express the thing once in the Ballerina type and because it's almost like a schema, you can map it onto your GraphQL type and you can use like a regular type system. It also has something called semantic subtyping, which means that you can think of a type as being a set of values and you can think of the subtype relationship as corresponding to the subset relationship between types, which is something that you see in some Cchema languages.

So, you can use Ballerina types to basically describe what's on the wire. So, you can have features like say, the particular field is optional. That happens all the time in JSON, that you may or may not have this particular field in an object, but most type systems, you don't have that. You have defaults, but that's not the same thing. You're able to describe what's there. Or you can say, "You can just have this or this." Again, that's an absolutely basic thing when you think of it as a schema, but most languages don't do that. You can't say, "Well, it's either this, or this." You've got to say, "You've got to have some sort of type hierarchy or something." I mean, TypeScript can do it, and probably TypeScript is the closest of mainstream languages in terms of how the type system works. Because again, what TypeScript is doing is it's describing JavaScript values and JavaScript values are pretty close to JSON, so you can think of TypeScript as basically describing JSON values.

So in a way, it has some similarity to TypeScript. TypeScript is very much tied to JavaScript, which has a anything goes, free and easy, very dynamic view of the world. Whereas Ballerina is, trying to want to catch your errors. Eventually we want to be able to compile things into a nice excerptable. At the moment, current implementation is based on Java, but that's not part of the language. That's just the current implementation and we plan to have a native implementation where everything's statically compiled.

Can you describe the support for lightweight threads - Strands. They are run time managed, logical threads of control, right? [16:37]

Charles Humble: I want to return briefly to the concurrency model. We talked about it in the context of the sequence diagrams, the visual aspects, but I'd like to talk a little bit more about the concurrency model in general. So, one thing is that you support lightweight threads. You refer to these as strands, and these are analogous, I think, to virtual threads in Java's Project Loom, which we've talked about in a previous episode with Ron Pressler on the podcast, so I'll link to that in the show notes. But basically, these strands are run time managed, logical threads of control, right?

James Clark: Exactly. They're logical threads. I mean, it's very fashionable these days to do everything with asynchronous programming, but I think that makes life awfully hard for the programmer, this whole async thing, and promises, and all that sort of stuff, is just a layer of complexity for the poor application programmer. I think a thread model, where you just present a very simple logical model to the programmer, and it's up to the implementation to turn that into something efficient. I think that's a better model for the programmer. Fundamentally, a threads is a better program model, I feel.

How do strands enable cooperative multitasking? [17:36]

Charles Humble: How do strands enable cooperative multitasking?

James Clark: Well, so strands, it's as you say, it's a logical thread. So, these things logically run concurrently. Whether they're actually running on two threads or not, depends on whether the compiler can figure out that it's safe to do so. So, if we have locking constructs and you haven't used the locking construct, then the compiler will figure out that, "Okay, we can't run these things in parallel." So, it will switch between them, so you never get two things running on two different threads. So, the first data races can't happen.

So in the worst case, it will just run on one thread. Typically when you do some IO, that logical strand will block waiting to do the IO and another logical thread will start running. Because in a lot of cases within network programming, it's really the IO that really matters. So, so long as you can ... if you got to go out and I don't know, go to five different web services and get results from them and then compute your result from that, what's important is you don't do one, wait for that, then do the other one, wait for that. You want to be waiting at the same time. Whether you are actually using multiple cores is not such a big deal, it's more about having the IO work in parallel.

A function can be defined by multiple named workers. Can you describe how that works? [18:44]

Charles Humble: Then to bring this back to the visual aspects, the concurrent operations in a function can be defined by multiple named workers. Can you describe that and how that works for us?

James Clark: Right, so within a function, and this is going back to the sequence diagram model, but at the top level within a function, you can have blocks that are named workers and those blocks run in parallel and they can exchange messages with each other and the messages are matched up at compile time to make sure that everyone ... So, if you think of a sequence diagram, your arrows match up. So, the arrows in the sequence diagram, there'll be a send in one block and a receive from the other block. In order to be able to build that sequence diagrams, you'll check that the sends and the receives in each block match up.

It's probably easier with a picture. I mean, a sequence diagram, you have a send and receiver and if one block is sending and one block is receiving, in order to be able to draw a diagram with an arrow from one to the other, you've got to have a send in one and receive in the other, or vice versa. So, that is part of the semantics of the language, that you can match those lines up. What the static checking does is check that your sends and receives do match up and so you can achieve every line has a send and receiver.

Which is a somewhat restrictive model, but when it applies to your problem, you detect a lot of problems at compile time that would not otherwise be detectable. You also get the diagram that actually gives you real insight into what your program is doing in terms of network interaction.

What's next for Ballerina? [20:07]

Charles Humble: What's next for Ballerina? What are you currently working on?

James Clark: Currently, we are working on getting the Java version. We're just in the process of finishing up the beta for the major release, we have this rather ... We call it Swan Lake, which is the first-

Charles Humble: Oh, I'm loving this.

James Clark: Yeah, next one's going to be Nutcracker.

Charles Humble: Excellent, so you're working away through all the Tchaikovsky ballets. You need a Sleeping Beauty and a Romeo and Juliet, I think.

James Clark: Anyway, so we're starting on Swan lake. The idea is going to be that really represents the language being mature, not language being perfect, there's tons of stuff we can do, but we've got a fairly comprehensive set of language features, which we're happy with. There's plenty of stuff we want to do, but it's a solid base and we'll have a Java-based implementation that is a solid implementation of those features. So we're just finishing that up.

I guess several things are going in parallel, but I think that the next thing is, which is what I'm working on, is doing a native implementation. So targeting LLVM, being able to use native executables that don't have any dependency on Java. We're also, which is interesting, doing it in Ballerina. So, we're trying to write a Ballerina compiler in Ballerina.

This is not what Ballerina's designed for. Obviously, Ballerina's designed for writing relatively small programs to enterprise integration. So, using it to write a compiler is pushing it to its limits but I think that that's good because it pushes the current implementation and it pushes the language. I think one of the goals of Ballerina is that you shouldn't run into a wall. You should be able to start small and as your program grows, Ballerina will grow with you. I think if we can write a compiler in Ballerina, then adding whatever integration problem you need to solve, you can confident that Ballerina will have sufficient horsepower to be able to do it.

If listeners want to find out more about the language, where's the best place for them to get started? [21:52]

Charles Humble: If listeners want to find out more about the language and maybe have a play with it and see what they think, where's the best place for them to get started?

James Clark: The ballerina.io website.

Charles Humble: Nice, easy answer.

James Clark: Nice, easy answer. Everything's linked to from that.

Charles Humble: Excellent. All right. I shall include a link to that in the show notes and with that, James, thank you very much indeed for joining me today on The InfoQ Podcast.

James Clark: Thank you for having me, I enjoyed our conversation.

Mentioned

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.