BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Ballerina: a Data-Oriented Programming Language

Ballerina: a Data-Oriented Programming Language

Lire ce contenu en français

Bookmarks

Key Takeaways

  • Ballerina’s flexible type system brings the best of statically typed and dynamically typed languages in terms of safety, clarity, and speed of development.
  • Ballerina treats data as a first-class citizen that can be created without extra ceremony, just like strings and numbers.
  • Ballerina features a rich query language that enables developers to express business logic and data manipulation with eloquence.
  • In Ballerina records, fields can be either mandatory or optional.
  • Ballerina supports JSON out of the box.

In the information systems I have built over the past decade, data is exchanged between programs like frontend applications, backend servers, and service workers. Those programs use exchange formats, like JSON, to communicate over the wire.

Over the years, I have noticed that a  program’s complexity  did not only depend on the complexity of the business requirements but also on the approach I took to represent data inside my programs. 

In statically-typed languages (like Java, C#, Go, OCaml, or Haskell), it seems natural to represent data with custom types or classes, while in dynamically-typed languages (like JavaScript, Ruby, Python, or Clojure), we usually use generic data structures, like maps and arrays.

Each approach has its benefits and costs. When we represent data with static types, we get great support from our IDE and safety from our type system, but it makes the code more verbose and the data model rigid.

On the other hand, in dynamically-typed languages, we represent data with flexible maps. It allows us to quickly create small to middle-sized code without any type of ceremony, but we are operating in the wild. Our IDE doesn’t help us to autocomplete field names, and when we mistype field names, we get runtime errors. 

Ballerina’s refreshing approach to types 

Until I discovered Ballerina, I thought that this trade-off was an inherent part of programming that we were forced to live with. But I was wrong: it’s possible to combine the best of both worlds. It’s possible to move fast without compromising on safety and clarity. It’s possible to benefit from a flexible type system.

I cannot afford to walk, it's too slow.

I am scared to run, it's too risky.

I want to flow with ease and confidence. Like a ballerina.

Data as a first-class citizen

When we write a program that manipulates data, it’s preferable to treat data as a first-class citizen. One of the privileges of first-class citizens is that they can be created without extra ceremony, just like numbers and strings. 

Unfortunately, in statically-typed languages, data doesn’t usually have the privilege of being created without ceremony. You need to use a named constructor to create data. When data is not nested, the absence of data literals is not too cumbersome, for example, when creating a library member named Kelly Kapowski that is 17 years old.

Member kelly = new Member(
  "Kelly",
  "Kapowski",
   17
);

But with nested data, the usage of a named constructor becomes verbose. Here is what data creation looks like when we include the list of books that Kelly currently holds, assuming a simplistic library data model, where a book has only a title and an author.

Member kelly = new Member(
  "Kelly",
  "Kapowski",
   17,
   List.of(
       new Book(
         "The Volleyball Handbook",
         new Author("Bob", "Miller")
       )
    )
);

In dynamically-typed languages, like JavaScript, the usage of data literals makes it much more natural to create nested data.

var kelly = {
   firstName: "Kelly",
   lastName: "Kapowski",
   age: 17,
   books: [
            {
                title: "The Volleyball Handbook",
                author: {
                    firstName: "Bob",
                    lastName: "Miller"
                }
            }
       ]
};

The problem with the dynamically-typed languages' approach to data is that data is untamed. The only thing that you know about your data is that it’s a nested map. As a result, you need to rely on documentation to know what kind of data you have in hand. 

The first thing I appreciated in Ballerina is that it gave me the ability to create my custom types while keeping the convenience of creating data via data literals. 

In Ballerina, like in a statically-typed language, we create our custom record types to represent our data model.  Here is how we create Author, Book, and Member record types:

type Author record {
   string firstName;
   string lastName;
};

type Book record {
   string title;
   Author author;
};

type Member record {
   string firstName;
   string lastName;
   int age;
   Book[] books;
};

And in Ballerina, like in dynamically-typed languages, we create data with data literals. 

Member kelly = {
        firstName: "Kelly",
        lastName: "Kapowski",
        age: 17,
        books: [
            {
                title: "The Volleyball Handbook",
                author: {
                    firstName: "Bob",
                    lastName: "Miller"
                }
            }
       ]
    };

Of course, like in a traditional statically-typed language, the type system lets us know when we have missed a field in a record. Our code won’t compile, and the compiler will tell us exactly why.

Author yehonathan = {
   firstName: "Yehonathan"
};
ERROR [...] missing non-defaultable required record field 'lastName'

In VSCode, when the Ballerina extension is installed, you get notified about the missing field as you type.

Now, you’re probably asking yourself whether Ballerina’s type system is static or dynamic. Let’s take a look.

Ballerina’s flexible type system

In a data-oriented program, enriching data with calculated fields is quite common. For example, suppose I want to enrich a piece of author data with a field called fullName that holds the author's full name. 

In a traditional statically-typed language, I’d need to create a new type for this enriched piece of data, maybe a new type called EnrichedAuthor. In Ballerina, that’s not required; the type system allows you to add record fields on the fly, using the bracket notation, like in a dynamically-typed language. For example, here is how we add a fullName field to an Author record:

Author yehonathan = {
   firstName: "Yehonathan",
   lastName: "Sharvit"
};

yehonathan["fullName"] = "Yehonathan Sharvit";


I find this capability quite amazing. In a sense, Ballerina allows us -- the developers -- to have our cake and eat it too, by elegantly introducing a semantic difference between two different notations:

  1. When we use the dot notation to access or modify a record field, Ballerina gives us the same safety and help we are used to in statically-typed languages.

  2. When we use the bracket notation to access or modify a record field, Ballerina gives us the same flexibility we benefit from in dynamically-typed languages.

In some cases, we want to be stricter and disallow the addition of fields completely. No problem: Ballerina supports closed records. The syntax of closed records is similar to the syntax of open records, except that the field list is enclosed within two | characters. 

type ClosedAuthor record {|
   string firstName;
   string lastName;
|};

ClosedAuthor yehonathan = {
   firstName: "Yehonathan",
   lastName: "Sharvit"
};

The type system doesn’t let you add a field to a closed record.

yehonathan["fullName"] = "Yehonathan Sharvit";
ERROR [...] undefined field 'fullName' in 'ClosedAuthor'

Ballerina also supports optional fields in records via the question mark sign. In the following record, the author’s first name is optional.

type AuthorWithOptionalFirstName record {
   string firstName?;
   string lastName;
};

When you access an optional field in a record, you need to make sure you properly handle the case where the field is not present. In traditional dynamically-typed languages, the absence of a static type checker makes it too easy to forget to handle that case. Tony Hoare introduced Null-references in 1965 in a programming language called ALGOL, and he later considered it  a billion-dollar mistake

In Ballerina, the type system is there for you. Suppose you want to write a function that uppercases an author’s first name.

function upperCaseFirstName(AuthorWithOptionalFirstName author) {
   author.firstName = author.firstName.toUpperAscii();
}

This code won’t compile: the type system (and the Ballerina VSCode Extension) will remind you that there is no guarantee that the optional field is there.

ERROR [...] undefined function 'toUpperAscii' in type 'string?'

So how do we fix our code to handle the absence of the optional field properly? It’s quite simple; after you access the optional field you check if it’s there or not. In Ballerina, the absence of a field is represented by (). 

function upperCaseFirstName(AuthorWithOptionalFirstName author) {
   string? firstName = author.firstName;
   if (firstName is ()) {
       return;
   }
   author.firstName = firstName.toUpperAscii();
}

Note that no type casting is needed. The type system is smart enough to understand that the variable firstName is  guaranteed to be a string after we have checked that firstName is not ().

Another aspect of the Ballerina type system that I find very useful, in the context of data-oriented programming, is that record types are only defined via the structure of their fields. Let me clarify.

When we write a program that manipulates data, most of our codebase is made of functions that receive data and return data. Each function has requirements about the shape of the data it receives. 

In statically-typed languages, those requirements are expressed as types or classes. By looking at a function signature, you know exactly what the data shape of the function arguments is. The problem is that it sometimes creates a tight coupling between the code and the data. 

Let me give you an example. Suppose you want to write a function that returns the full name of an author, you would probably write something like this:

function fullName(Author author) returns string {
   return author.firstName + " " + author.lastName;
}

The limitation of this function is that it only works with records of type Author. I find it a bit disappointing that it doesn’t work with Member records. After all, a Member record also has firstName and  lastName string fields. 

Side Note: Some statically-typed languages allow you to overcome this limitation by creating data interfaces.

Dynamically-typed languages are much more flexible. In JavaScript, for instance, you’ll implement the function like this:

function fullName(author) {
  return author.firstName + " " + author.lastName;
}

The function argument is named author, but in fact, it works with any piece of data that has firstName and  lastName string fields. The problem is that when you pass a piece of data that doesn’t have one of these fields, you get a run-time exception. Moreover, the expected data shape of the function arguments is not expressed in the code. So, to know what kind of data the function expects, we have to either rely on documentation (which is not always up to date) or investigate the code of the function.  

Ballerina’s flexible type system allows you to specify the shape of your function arguments, without compromising flexibility. You can create a new record type, which only mentions the record fields the function needs in order to work properly.

type Named record {
   string firstName;
   string lastName;
};

function fullName(Named a) returns string {
   return a.firstName + " " + a.lastName;
}


Ballerina’s flexible type system allows you to specify the shape of your function arguments, without compromising flexibility. You can create a new record type, which only mentions the record fields the function needs in order to work properly.

type Named record {
   string firstName;
   string lastName;
};

function fullName(Named a) returns string {
   return a.firstName + " " + a.lastName;
}

PRO TIP: You can use an anonymous record type to specify the shape of your function arguments.

function fullName(record {
                     string firstName;
                     string lastName;
                 } a)
               returns string {
   return a.firstName + " " + a.lastName;
}

You are free to call your function with any record that has the required fields, whether it’s a Member or an Author, or any other record that has the two string fields that the function expects. 

Member kelly = {
        firstName: "Kelly",
        lastName: "Kapowski",
        age: 17,
        books: [
            {
                title: "The Volleyball Handbook",
                author: {
                    firstName: "Bob",
                    lastName: "Miller",
                    fullName: "Bob Miller"
                }
            }
       ]
    };

fullName(kelly);
// "Kelly Kapowski"

fullName(kelly.books[0].author);
// "Bob Miller"

Here is an analogy that I find useful to illustrate Ballerina’s approach to types: Types are like eyeglasses that we use in our programs to look at reality. But we need to remember that what we see through our lenses is only an aspect of reality. It is not the reality itself. Like the idiom says: the map is not the territory.

For instance, it is not accurate to say that the function fullName -- defined above -- receives is a Named record. It is more accurate to say that the function fullName decides to look at is the data it receives through the lenses of a Named record.

Let's look at another example. In Ballerina, two records of different types that have the exact same field values are considered equal.

Author yehonathan = {
   firstName: "Yehonathan",
   lastName: "Sharvit"
};
AuthorWithBooks sharvit = {
   firstName: "Yehonathan",
   lastName: "Sharvit"
};
yehonathan == sharvit;
// true

At first, this behavior surprised me. How could two records of different types be considered equal? But when I thought about the eyeglasses analogy, it made sense to me:
The two types are two different lenses that are looking at the same reality. In our programs, what matters the most is the reality, not the lenses. Sometimes, traditional statically-typed languages seem to put more emphasis on the lenses than on reality.

So far, we have seen how Ballerina leverages types so that they are not in our way, but rather assist us on our way to make our development workflow more effective. Ballerina goes one step further and allows us to manipulate data in a powerful and convenient way via an expressive query language.

The power of an expressive query language

As an adept of functional programming, my “bread and butter” commands when I need to manipulate data are made of high-order functions like map, filter, and reduce. Ballerina supports functional programming, but the idiomatic way to deal with data manipulation in Ballerina is via its expressive query language, which allows us to express business logic with eloquence.

Suppose we have a collection of records, and we only want to keep the records that satisfy a certain condition and enrich those records with a calculated field. For instance, let’s say we only want to keep books whose title contains the word “Volleyball”, and enrich them with the author's full name. 

Here is the function that enriches the Author record inside a book.

function enrichAuthor(Book book) returns Book {
   book.author["fullName"] = fullName(book.author);
   return book;
}

We could use map and filter to enrich our book collection, using map, filter and a couple of anonymous functions.

function enrichBooks(Book[] books) returns Book[] {
   return books.filter(function(Book book) returns boolean {
       return book.title.includes("Volleyball");
   }).
   map(function(Book book) returns Book {
       return enrichAuthor(book);
   });
}

But it’s quite verbose and a bit annoying to declare the types of the two anonymous functions. Using Ballerina query language, the code is more compact and easier to read.

function enrichBooks(Book[] books) returns Book[] {
   return from var book in books
       where book.title.includes("Volleyball")
       select enrichAuthor(book);
}

Ballerina query language will be covered in greater detail in our Ballerina series.

Before we move forward and talk about JSON, let’s write a little unit test for our function. In Ballerina, records are considered equal when they have the same fields and values. So, it makes it straightforward to compare the data a function returns with the data we expect.

Book bookWithVolleyball = {
   title: "The Volleyball Handbook",
   author: {
       firstName: "Bob",
       lastName: "Miller"
   }
};
Book bookWithoutVolleyball = {
   title: "Friendship Bread",
   author: {
       firstName: "Darien",
       lastName: "Gee"
   }
};
Book[] books = [bookWithVolleyball, bookWithoutVolleyball];
Book[] expectedResult =  [
           {
               title: "The Volleyball Handbook",
               author: {
                   firstName: "Bob",
                   lastName: "Miller",
                   fullName: "Bob Miller"
               }
           }
 ];
enrichBooks(books) == expectedResult;
// true

PRO TIP: Ballerina comes with an out of the box unit test framework.

Now that we have seen the flexibility and ease that Ballerina provides around data representation and data manipulation inside a program, let’s see how Ballerina allows us to exchange data with other programs.

JSON support out-of-the-box

JSON is probably the most popular format for data exchange. Quite often, programs involved in information systems communicate by sending each other JSON strings. When a program needs to send data over the wire, it serializes a data structure into a JSON string. And when a program receives a JSON string, it needs to parse it to convert it to a data structure.

Ballerina, being a language designed for the cloud era, supports JSON serialization and JSON parsing out of the box. Any record can be serialized into a JSON string, as seen here:

AuthorWithBooks yehonathan = {
   firstName: "Yehonathan",
   lastName: "Sharvit",
   numOfBooks: 1
 };

yehonathan.toJsonString();
// {"firstName":"Yehonathan", "lastName":"Sharvit", "numOfBooks":1}

Oppositely, a JSON string can be parsed into a record. Here, we need to be careful and make sure we handle cases where the JSON string is either not a valid JSON string or doesn’t conform to the data shape you expect. 

function helloAuthor(string authorStr) returns error? {
   Author|error author = authorStr.fromJsonStringWithType();
   if (author is error) {
       return author;
   } else {
       io:println("Hello, ", author.firstName, "!");
   }
}

PRO TIP: Ballerina embraces errors and allows us to succinctly write the same logic in a more compact way via a special check construct.

function helloAuthor(string authorStr) returns error? {
   Author author = check authorStr.fromJsonStringWithType();
   io:println("Hello, ", author.firstName, "!");
}

Side Note: JSON support in Ballerina goes far beyond serialization and parsing. In fact, Ballerina comes with a json type that allows you to manipulate data exactly like in a dynamic language. Advanced JSON in Ballerina will be covered later in our Ballerina series.

We have explored the benefits Ballerina provides around data representation, data manipulation, and data communication. We are going to conclude our exploration with an example of a mini data-oriented program that illustrates those benefits.

Final example: Manipulating data with ease and confidence

Imagine we’re building a Library Management System made of multiple programs that exchange data about members, books, and authors. One of the programs is required to process member data, by enriching it with calculated fields of the full name of the member, only keeping books whose titles contain “Volleyball” and adding the author’s full name to each book.  


The program communicates over the wire using JSON: it receives the member data in JSON format and is expected to return it in JSON format.

Here is how the code for this program would look in Ballerina.
First, we create our custom record types. 

type Author record {
   string firstName;
   string lastName;
};

type Book record {
   string title;
   Author author;
};

type Member record {
   string firstName;
   string lastName;
   int age;
   Book[] books?; // books is an optional field
};

Then, a small utility function that calculates the full name of any record that has firstName and lastName string fields. We express this constraint using an anonymous record.

function fullName(record {
                     string firstName;
                     string lastName;
                 } a)
               returns string {
   return a.firstName + " " + a.lastName;
}

We use Ballerina query language to filter and enrich books:

  1. Only keep books whose titles contain “Volleyball”
  2. Enrich each book with the author’s full name
function enrichAuthor(Author author) returns Author {
   author["fullName"] = fullName(author);
   return author;
}

function enrichBooks(Book[] books) returns Book[] {
   return from var {author, title} in books
       where title.includes("Volleyball")  // filter books whose title include Volleyball
       let Author enrichedAuthor = enrichAuthor(author) // enrich the author field
       select {author: enrichedAuthor, title: title}; // select some fields 

Now, we write our business logic: a function that enriches a Member record with:

  1. The full name of the member
  2. The filtered and enriched books
     
function enrichMember(Member member) returns Member {
   member["fullName"] = fullName(member); // fullName works on member and authors
   Book[]? books = member.books; // books is an optional field,
   if (books is ()) { // handle explicitly the case where the field is not present
       return member;
   }
// the type system is smart enough to understand that here books is guaranteed to be an array
   member.books = enrichBooks(books);
   return member;
}

Finally, we write the program entry point that does the following:

  1. Parse JSON input into a Member record
  2. Call the function that deals with the business logic to get an enriched Member record
  3. Serialize the result to JSON

Note that we have to deal with the JSON string we receive being invalid. This is how it’s done:

  1. We declare that the return value could either be a string or an error.
  2. We call check on what is returned by fromJsonStringWithType. Ballerina automatically propagates an error, in case the JSON string we received is invalid.
function entryPoint(string memberJSON) returns string|error {
   Member member = check memberJSON.fromJsonStringWithType();
   var enrichedMember = enrichMember(member);
   return enrichedMember.toJsonString();
}

That’s it for the code that deals with the logic itself. You can find the complete code on GitHub.

In order to make it into a real application, I would use one of the many many protocols that Ballerina provides out of the box for communicating over the wire, like HTTP, GraphQL, Kafka, gRPC, WebSockets, and more.

Wrapping up

While working on the code snippets that are presented in this article, I had the impression that I was re-experiencing the pleasant sensation that my IDE used to bring me when I was working on statically typed languages. I was surprised to discover that to enjoy this experience, this time I didn’t have to compromise on the power of expression and the flexibility I’d gotten addicted to since starting to work with dynamically-typed languages. 

The main thing that I’m missing in Ballerina is the ability to update a piece of data without mutating it, as I am used to in functional programming. I was not able to implement this capability as a custom function in Ballerina, as it requires support for handling generic types. But I do hope that in the near future this capability will be added to the language.

I see Ballerina as a general-purpose programming language, whose approach to data makes it a great fit for building information systems. In my opinion, this is due to Ballerina’s key values around data representation, data manipulation, and data communication.

  • It treats data as a first-class citizen
  • Its flexible type system delivers more flexibility than traditional statically-typed languages, without compromising on safety and tooling
  • Its flexible type system delivers more tooling and safety than dynamically-typed languages, without compromising on velocity and power of expression
  • It has an expressive query language for data manipulation
  • It supports JSON out of the box for exchanging data over the wire

You can learn more about Ballerina by visiting ballerina.io.

In the upcoming articles of our Ballerina series, we will cover additional aspects of Ballerina, like tables, advanced queries, error handling, maps, json type, connectors, and more... You can register to our newsletter to get notified when the next article in the Ballerina series is published.

About the Author

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Units

    by Stein Somers,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    “two records of different types that have the exact same field values are considered equal”. That sounds scary to me, like JavaScript's equality. How can we make sure that these two are not considered equal:

    ClimateOrbiterMetric sent = {
    angularMomentum: 1.5
    };

    ClimateOrbiterImperial received = {
    angularMomentum: 1.5
    };


    “For instance, it is not accurate to say that the function fullName -- defined above -- receives is a Named record.” At least grammatically it's certainly more accurate to say "that what".

  • Re: Units

    by Ches Martin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    In a statically typed language such as this one, I'd argue that the solution to the problem you pose is to use the type system: rather than using a double type for angularMomentum, encode it as a type which expresses the unit explicitly so that the type system prevents you from comparing two records that use different units for their like fields. (This is sometimes called "primitive obsession").

    It's going a bit further adrift from the question, but some languages have features to make these kinds of wrapper types (e.g. Kilogram(1.5)) lightweight or zero-cost in some cases at runtime versus the underlying double value—see value objects a.k.a. user-defined primitives for Java, Haskell's newtype, etc. I'm not familiar with whether Ballerina has any such feature yet.

    Perhaps for more complex units like angular momentum, a library for units & dimensional analysis can be developed for Ballerina, like Squants in Scala :-)

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT