InfoQ Homepage Presentations C#'s Functional Journey

C#'s Functional Journey

View Presentation

Speed:

Download

38:38

Summary

Mads Torgersen discusses how object-oriented languages, particularly C#, have adopted functional features, and what to expect next.

Bio

Mads Torgersen is the lead designer of the C# programming language, and a program manager at Microsoft. Over the years he’s also participated in the design of TypeScript, Visual Basic.NET and even Java.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Torgersen: I'm Mads Torgersen. I am the current lead designer of C#. I've been that for a good half decade now, and worked on the language for about 15 years. It's just a bit older than that, about two decades old. During that, it's gone through a phenomenal journey of transformation. Started out as a very classic, very turn of the century mainstream object-oriented language, and has evolved a lot. Many of the things that happened over time, were inspired/borrowed/stolen from the functional world. There's been a lot of crossover there.

Functional Additions to C# over Time

There are many things that we can talk about here. Just to put a little bit of structure, I'll start in the past. I'll talk about the functional stuff we were getting into C# around the time when I first joined 15 years ago, and throughout that period. Where it was very much about staying within the object-oriented paradigm, but trying to enhance it in various ways, in particular, around control flow, and those kinds of things. Then we went into a phase that we're still in, where the focus was really triggered by the cloud and devices. The fact that much more than in the distant past that many of you probably don't remember, data is something that is shared between many different applications, concurrently. That's a scenario where in some sense, history is on the side of functional programming. This is a point in time where we had to not just enhance object-oriented programming, but provide an alternative within the C# language, to supplement the object-oriented programming with making a more functional paradigm easier within the language. Then finally, we can talk a little bit about the future. Some of the thoughts we have in different directions being inspired by functional programming, including through the type system, how to really get rid of some of the strong dependencies that are a bit of a tax on software development processes in object-oriented programming today. That's the overall structure of it.

I'm going to spend most of this talk, the two first bullets of these three, I'm going to spend in Visual Studio, which some of you may have heard of. It's the world's first and foremost IDE. I'm sure many of you will agree with, and it's coming from Microsoft. It has excellent support for C#. I am going to go right in there. We're going to simply take a little journey through the first three versions of C# and all the functional stuff that happened there.

C# 1.0

This is a C# 1.0 program, pretty much, attempting to abstract over some functionality using first class functions. Actually, the very first version of C#, unlike many object-oriented programming languages actually supported first class functions from the get-go. You have function types, you have to declare them. They're nominal, just like declaring a class or an interface, or a struct, you have to declare a delegate, as we call them. They're function types. They're really crappy in many ways. They get the job done. You declare the predicate type here, as something that takes an int and returns a bool. Then I can write a method that takes a predicate, using that type. When I call a method, I have to parse it. There are no function literals yet, so I have to parse it a named method. I can parse it the method GreaterThanFive here, which fits the type structurally, and therefore can be converted to a delegate object, which is a first class function parsed to the filter and applied there. You can see the application of it here. A little calling of p here in call at as a first class function.

That's all writing in C# from the get-go, which is a little bit of a leg up on some of the other languages. Obviously, it's fairly clunky, and so on. From a type perspective, it's not super elegant. That's something that it shares with most object-oriented languages at the time that the programming world has these two different kinds of polymorphism. Two different ways that you can write code that applies to multiple different types at a time. The functional camp has gone with parametric polymorphism, where you have parameterized over types. Your code is parameterized over types and can be applied to different types. Object-oriented programming had gone with subtype polymorphism, where you have hierarchies of different classes, and they can all pose as the base type and be mixed together, like in collections and so on like that. Both have strengths and weaknesses. Around the turn of the century here, we were all engaged in trying to figure out how to get parametric polymorphism into object-oriented programming languages. To help with things like, I have a collection here that is untyped. I put ints into it but I can't say that, so I have to cast them as they come out again, and so on. The whole thing is specific to ints, there's no abstraction over types.

C# 2.0

First thing we do with C# 2.0 is we need parametric polymorphism, we add generics. Now I can make my types generic, so I can make my predicate type generic here and have it apply to all kinds of values, not just ints. That's very nice. Of course, I need to then say that what I take here in my filter method is that it's a predicate of int. I can also make methods generic. Many of you who don't know C#, you also know this from Java, you should not be surprised. The support for generics in C# is different mainly in that it has worked all the way into the runtime. It's not erased by the compiler. It runs deeper. It's more complete. I can make my filter method generic as well saying, I can take any predicate, and any array, and return that same array. If I can just replace all the ints with T's here, I should be able to apply the filter to anything. I should not replace that one. We filter to anything, as long as the T's match up in the signature. Here you can see that my call to filter doesn't actually need to parse int explicitly as a type argument. That can be inferred implicitly. There's a little bit of the type inference that's also creeping into C# as well. That's generics.

With generics, arrives the opportunity to have better libraries. That's a big thing when you add generic, you can add generic libraries. Let's forget about these old, clunky collections. Let's have some generic ones. Now we can check this ArrayList on the garbage heap. We can use list of T instead. New generic list. Of course, that is all very nice. Now I don't have to cast things when I take them out, because they're already known to be all these benefits. What's more, list of T can actually have better methods on it, because it knows it's tight. We could actually just return the calling to ToArray on this accumulator method that we have. We don't even have to do our own manual building up of the result here. There's a method for that. Generics really help with the abstraction here. All by taking this functional programming language polymorphism into C# as well.

This should be repetition for people who went through this in real-time. The rest of you who just grew up much later than us, you'll be like, "That's completely basic." This is how it came in to C# and to many object-oriented programming languages of that age. It became the new standard for what should obviously be in any modern language. We can actually do a little better here. Array used to be the only generic type, now we can have other collection types. For instance, we can have one that's called IEnumerable of T, which abstracts over essentially anything that can be iterated over. We can take IEnumerable of T, and we can return IEnumerable of T. I'm getting sick of writing types all the time, I'll cheat and use a C# 3.0 feature here and just use type inference on local variables here. IEnumerables can be foreached over just like arrays and everything else.

When I do that, I can now write my method as a generator, which is another thing that came out of the functional world where I am lazily producing the results of this filter, as you ask for them. Instead of having this accumulation into a temporary collection, and returning that, I can simply say, yield return, because this is an old language, we have to have some syntax there. Yield return the value, and then I don't need the temporary collection. I don't need to return. I'm simply now writing a generator that whenever the foreach here asks for the next value, the IEnumerable will go say, let me run the next bit of code that produces that. Again, another functional thing that makes it into C# 2.0.

The last thing though, is the first attempt at having function literals or anonymous functions. Instead of having to have a named method for everything that we want to parse as a function, we can get rid of that declared method there. The syntax isn't beautiful, but this is the first run at a syntax for anonymous functions.

C# 3.0

Now comes C# 3.0, and we want to start addressing scenarios like interoperating with databases. We have what we call language integrated query is the big new thing here. In order to do that, we need to do better on the functional front. First of all, let's take this clunky syntax and let's give that a do-over. Let's have lambda expressions in C# instead. You can write it like this, which isn't much better. Lambda expressions can optionally have expression bodies, so that gets a little better. Also, we get more type inference. You don't actually have to say the parameter types of a lambda expression, as long as they are parsed to something that will tell them the type. More type inference here, because we're calling filter with an array of int, then T will be inferred to be int. Then this is inferred to be a predicate of int, and therefore, int is pushed back down into the lambda and its body is type checked in the context of that. More type inference there.

We also realize that we're going to be using first class functions a lot, so this whole declaring your own delegates, that really sucks. Functional languages, they have, typically, structural function types. We should get at least closer to that. We can just add a library of generic delegates to C# itself. We now have Func, and that's a delegate type that's pre-declared. We have them up to certain number of parameters, which grows for every release, because we want to try to capture bigger functions. We can get rid of that as well. Now we're down to something that is fairly neat. You saw the var here. We can put var in many places. More type inference. Who needs to see the types all the time? Then, one thing that when you start using something like filter here, you might want to apply more than one filter. Now it gets really annoying because I have to want to apply a filter to this, then I get this nesting going on. Let's say I want to get all the things that can be divided by three. That just looks really ugly. That becomes a lump of code.

What a functional language would do here is it would have a pipeline operator. It would say, the first parameter there, we can pipe that into a function call with some syntax. C# can't do this, so I'm just going to wing it here. I could even take the original array, and I could pipe that into the first filter call there, something like this. This is what a functional language would do. We don't put that into Sharp. What we do instead is we say, ok, you can declare a static method like this, you can declare it as an extension method. That means that it can be called on its first parameter as if it was an instance method on it. Now I can say array.Filter. I can say, that .Filter, and you get that fluent flow that you would get in a functional language as well. Yet another inspiration there. Extension methods really prove their worth in C#.

Of course, querying is a general thing. We add a library called Linq for Language Integrated Query, where you don't even have to write your own filter method, you can just use the where method from there, and it looks exactly the same way as the filter I just used. We can get rid of that as well. We're doing some querying here. This is really beautiful, and you can add other query methods. Instead of here, you could have a select method where we transform the result. Let's say we also add support for little, anonymous record like thing. I could output an anonymous object of x and x-squared or whatever, y equals x-squared, some silly thing. You can put together what looks more like a SQL query. You really like some nice syntax just like SQL. We could say, from x in array, where x less than five, select. You can now write this as a syntactic sugar for the same thing, for stringing together these calls and lambdas. This is a functional construct. This is a list comprehension or query comprehension. It's monadic. C# has monads in this particular case, and it does the job. That's the classic phase.

We also add the ability to actually quote lambda expressions so that when you parse them, instead of parsing them to a delegate type, you parse them to another type called expression of T. Instead of getting a lambda function you can execute, you actually get a syntax tree of the lambda at runtime. Code quotations Lisp style comes in at this point. That's how we build our ORM support, the Object Relational Mapping. When you write a query like this, it gets translated into a tree that can then be parsed to a SQL translator. Turn it into SQL. Parse it to a SQL database, and run the query like that. That was our first functional boost.

Aspect of C# Syntax that Could Be Retired

Randy: Is there any aspect of the C# syntax you would retire?

Torgersen: Yes. Certainly, that first shot at function literals, the anonymous methods was a clunky mistake, and we overrode it immediately in C# 3.0. Of course, we can't take it out, but coming up with a more flexible lambda syntax. That's an obvious one. I also think these delegates, I'm not showing the underbelly of the delegate type, the function types, but the way we designed them, they really weren't designed in C# 1.0 for functional programming. They were designed for programming with events, like supporting subject observer style eventing. They can contain more than one function. They can contain a list of functions. If you execute them, all of those functions will get called and only the last one, you'll get its result. It's really hideous. That, I would have a do-over on that whole thing if I could.

How Current and Envisioned Future State of C# Support for Functional Programming Stack Up Against Scala

Randy: How does the current and envisioned future state of C# support for functional programming stack up against for example, Scala, which we know is a purely functional or more functional language?

Torgersen: I love Scala. Some details I don't like, but I love Scala in its philosophy, which is to produce a genuinely, not so much a multi-paradigm language but a unified paradigm language where everything is carefully worked together. As we add functional things to C#, we try to do it in the same way where they fit well with what's already there, rather than being a separate part of the language. I really adore that as a philosophy. I don't think we will get to where C# is a complete balance between functional and object-oriented, we're always object first. We're never going to compete with a functional first language like F#, for instance. There are so many things from functional programming, so many idioms, such an amount of type inference that we could never get there from here. It's in between there. We take these functional things in not because we want to be multi-paradigm, but because there are scenarios where it's really useful.

Cloud Driven Object-Oriented Programming Encapsulating Functionality and Data

The next wave really is this cloud driven wave where object-oriented programming focuses a lot on encapsulating functionality and data together. That's great for some scenarios. It really had a golden era and it's still good for many things. When your data is in the cloud, and is being shared across many different application areas, being used in different ways, then packaging the functionality and the data just doesn't make sense anymore. There, you really want to have the functions on the outside, not on the inside, which means the core data needs to be public in the data type. It often needs to be immutable, depending on how you architect. You need to be able to do the things that you do by having virtual methods in an object-oriented program, like you have shaped dependent behavior by overriding virtual methods in hierarchy. Those things that shape dependency, you need to be able to express from the outside. You need to write a function that takes some object in, the object doesn't know about the function, but the function itself does different things depending on the type of the object. That's what pattern matching is for.

C# 7.0

In C#, fast forward to C# 7.0, we get pattern matching into C#. We start laser focusing on that scenario only, so if you have some static void M that takes an object O. You've always been able to say in C#, if (O is string), for instance, and get a Boolean result back. Now when you want to do something about, you've lost it as a string. You check that it was a string, and then you lost it. You will say something like Console.WriteLine of String. What if you could just give that string a name while you're at it? Now you can say string, we can output the string as well, like this, in an interpolated string.

This now is an example of a pattern. Initially, we just allow patterns inside of already existing control structures in C#, so that each expression gets enhanced to not just check against types, but to check against various patterns. Switch statement as well. I can say switch of O, and instead of just comparing against constants in the cases, I'm going to switch syntax. It's just hideous in C# as well as all the other C-based languages. I can say case, and instead of just using constants here, I can say case string s. Then go do my thing. It's classic switch stuff where you have to put a break in order to close things out.

We saw the retcon constants as well, like constant values, they were also a pattern. I can also say is null, or things like that. Now patterns have a place in the language. There are only these two patterns pretty much to begin with. Since then we've been expanding on what patterns do we have in C# and where can they be used, so that we get more towards the expressiveness of C#. I want to show you an example of that. Since what I showed you, we've evolved C# to now have expression bodies in ordinary methods as well. This is an expression body method that takes in some object. Again, it's trying to apply functionality from the outside of the object model by saying, which thing is this? If it's a car, then we do one thing. If it's a taxi, then the pattern is more fancy. It will apply a property pattern to the taxi, and look at its Fares property, and now it applies a constant pattern to it. Say, there's zero people here, so they have to pay this much to go with the bridge, which is this all four. We have deconstruction in the language now. Bus has a deconstructor, so we can apply a positional pattern. Applying patterns recursively, again, to the parts of the bus.

In this nested switch here, we have relational patterns now coming into C# 9.0. We have logical patterns to and/or not to combine other patterns, and so on. Patterns have gone off as a thing of its own. Of course, this whole thing is not a switch statement but it's a switch expression, which is much more nifty and modern, and so on. This thing has ballooned over time to be a really expressive part of C# from being not there at all, a couple of versions ago. C# 6, there was no such thing as pattern matching. That's a big concession to, you need this style of programming to make up for the fact that you can't wrap the functionality up with your object hierarchy. I think you're seeing many languages doing the same thing, all driven by the need to work on outside data.

Speaking of outside data, one of the things you often want to do is treat that outside data more as a value, and often also work with it as an immutable value. Here's an example where this is me before I was married, and then I changed my last name, that's how you would mutate the thing. We added support for classes like this in C# now to have properties that can only be mutated during initialization. In this object initializer here, when the object is being created, I can still mutate the properties, but I can't change the last name. If I want to change the last name, I need to create a new record. I need to have an immutable discipline where I create a copy. I copy and modify. I do non-destructive mutation.

Records in C#

In order to support value semantics like that, you now also have records in C#, which come with a bunch of abbreviations and stuff. The main thing they give you, is they give you value semantics by default. Instead of always assuming object-oriented stuff, and you have to overwrite the defaults and write very long things to be immutable and value based, they give you this by default. Instead of creating person.LastName, I can now say var newPerson equals person with LastName equals Torgersen. There's non-destructive mutation supported in the language on records. More types will come over time to support non-destructive mutation, where you can say it's like that one with everything copied over. That merged with the object-oriented paradigm. You can see that I'm actually creating an employee here, but I'm only storing it as a person. The new person here will also be an employee and they will also copy over even the things that can't be seen statically at this point in time. That's like trying to fit this in with subtype polymorphism. You really need to make sure that non-destructive mutation works well even when you don't have the whole truth about the actual runtime type of things around at the point where you do the copying and mutation.

The other thing with records is that they have value equality. You compare two records that correspond to comparing all the members and making sure that they're equal. We take care of generating all that code and the hash code, and also making sure that it's symmetric. It's not difficult per se, but it's a maintenance nightmare to maintain a manually written equals function that is symmetric, and remembers to deal with all the data, and so on. We do that for you as well on records. That's another step towards functional programming.

Performance Concerns with Immutability

Randy: There were several people wondering about performance concerns with immutability.

Torgersen: This goes pretty much regardless of the programming language features, whether they are nifty or not. Typically, with immutability, that means allocating a lot of objects. In C#, sometimes you can get around that by using structs instead, then you have higher copying costs as well, because structs are value types, and they assign by copy. If they're big, the lesser evil is to have them be reference types, to be classes. Records here as well are classes. They are objects. We built very big things that use an immutable discipline. The C# compiler is not just a compiler, it's an API that can give you every syntactic and semantic detail about a C# program. It can help you build more C# code. It's what's being used by the C# IDE here, as I'm using the IDE. It's using incremental APIs and everything. It's a big API, and the whole data model is immutable. Performance really was a problem. Allocation was a problem as we built. We've had to do tricks along the way to share as much as possible, to use structs where we can to cheat, and not copy things where we can, and so on. That's this scourge of immutable programming. I don't know that there's much we can do about it, other than try to tune our garbage collectors for it.

When to Choose a Class versus Record

Randy: When would you choose a class versus a record and vice versa?

Torgersen: Records really are classes with extra bells and whistles. When the defaults of classes reference equality, mutability, when those things are not what you need, that's when you choose a record, when you need to be value based and potentially immutable. Typically, you don't have to be immutable. I can totally have a record that's mutable. C# doesn't prevent it in any way. You just have to be careful when you have mutable reference types with value equality, because the hash code will change over time. You stick that sucker in a hash table and you mutate it, it can never be found. They'll be dragons.

The core choice really is, I am working with data from the outside. I'm treating it as data. I'm treating as values. I don't care about the object. I will have object identity as a reference type, but I don't care about it. It's not the important part of the data model. You still have inheritance. You see records inheriting from each other here. Get all that modeling. We might add some shorthands to C# later to give you something like discriminated unions from functional programming. This gets you a fair bit of the way in that you can have these value oriented types where you can create a hierarchy to model the fact that there's a type with multiple different shapes. This gets you very far out of the way towards discriminated union style semantics.

Static Members in Interfaces

With that, I think I will go to the last bit which is the future. One thing that a few functional languages, notably Haskell, has really nailed, is that in Haskell, they have this thing called type classes that allow you to abstract over some types having a certain number of functions. Then to say that a given type satisfies that type class. When you witness that, you can do that independently of the declaration of either the type class or the concrete type. You have a third place in the code where you put them together. That gives an amazing amount of decoupling. Tight coupling is a bit of a scourge of a lot of programming, including definitely, object-oriented programming. In order for a class to implement abstractions, it has to implement an interface, for instance. It has to say right there that it implements the interface. It has to have pre-knowledge that it was going to be used in the context where the interface makes sense. It's been interesting for us to study, is there something we can learn from type classes, and flip them into a fully object-oriented feature set that could help us with that?

One thing is that this whole idea of functions on the outside, they're represented in C# by static members in many object-oriented languages. If an interface could abstract over static members, like static properties, and also operators, which are overloaded in C#, then we could say that int implements this particular thing that has a plus and a zero. If we constrain our generic by it, then we can now write generic numeric algorithms, for instance. That's nice, talking about the functions that apply to the type rather than just the instance members. We still have the strong coupling here between the int type and it has to know about the IMonoid and implement it.

Extensions

That's the problem here. What if you could instead say that somewhere else? What if after the fact I could say, here is an extension, if this is in scope for you, then it tells you that int32 implements the IMonoid interface as a third-party declaration. In the scope of this declaration, I know that int32 actually implements this interface, even though the type and the interface didn't know about each other, inherently. I can even say how it implements the interface in the cases where it doesn't already. It already has a plus operator, so that's implied. I can explicitly specify here how the witness here can say exactly how I implement this. That's a direction that we are thinking about taking the type system. It's going to be hard to do in practice, but we're working on it because it can really affect software development and decoupling of components in a way that object-oriented programming needs.

See more presentations with transcripts

Recorded at:

Feb 24, 2021

Mads Torgersen

InfoQ Software Architects' Newsletter