Book Review and Interview: Real World OCaml
The statically typed functional language OCaml has been around for nearly two decades and has influenced languages like F# and Scala.
O'Reilly has released the book "Real World OCaml" to introduce readers to idiomatic OCaml programming as well as to the language's libraries and tools.
Similar to O'Reilly's earlier "Real World Haskell", the text of "Real World OCaml" can be viewed for free on the website, as well as the numerous inline comments readers and reviewers have provided.
Ocaml differs from Haskell in many key ways; evaluation strategies (strict in OCaml vs lazy in Haskell), Monad worship (see interview below), and many more.
Note: most answers are joint answers written by both, except for those explictly marked out with their names.
InfoQ: Who uses OCaml today and for what?
Here are some of the major industrial uses of OCaml.
There's a growing list of companies that it on the OCaml website as well as several years of videos online at the Commercial Uses of Functional Programming workshop archives.
- Jane Street is a trading firm that uses OCaml as its primary language. Jane Street trades billions of dollars a day on a platform that is almost entirely written in OCaml. Yaron is the head of technology there.
- Citrix uses OCaml for the XenServer virtual server distribution, which has been downloaded over a million times and powers datacenters such as the Rackspace Cloud. (There's a paper about this at: http://anil.recoil.org/papers/2010-icfp-xen.pdf)
- Facebook has been using OCaml to built development tools for years now. Their oldest project is Pfff, which is a set of static analysis tools for a variety of languages, including PHP, C++ and OCaml. Pfff code.
They've also recently announced a new compiler, called Hack, for a statically typed variant of PHP. The Hack compiler is an impressive accomplishment, being a parallelized compiler that provides nearly instantaneous recompiles for Facebook's multi-million-line PHP codebase. The demo we saw showed the compiler keeping up with a large git rebase, so that the compilation was finished 20 milliseconds after the git rebase.
Note: InfoQ recently interviewed Facebook's Keith Adams on the Hack language variant among other topics.
InfoQ: What drew you to OCaml rather than some of the other functional languages which might have larger communities?
OCaml has an unusual mix of qualities that make it great for systems development.
- It's simple: OCaml manages to have a powerful type system that can encode many invariants about your program that still holds together elegantly. That makes it easier to learn and easier to use on a day to day basis.
- It's stable: In the last decade OCaml has evolved in a modest and careful way, only adding features that have clear value. That provides a more stable base to engineer on top of.
InfoQ: Can you characterize OCaml for our readers - where does it fit in the space of (functional) languages, ie static/dynamic, compiled/VM, memory management, functional/OOP/hybrid/neither, purity, laziness, state management, etc.
Yaron: Sure, let's play feature bingo! OCaml is a statically typed, eager, GC'd, functional language that compiles to native code, as well as providing a yet-more-portable bytecode compiler. It provides good support for pure programming, but side-effects can be used freely, and are not tracked by the type system. It has a module system which provides powerful abstraction tools like functors and first-class modules. It also has an object system, but that is used relatively rarely.
Anil: One of the earliest commercial uses I put OCaml to was to build the XenServer management toolstack, and we found that all three of the major programming styles had a place in the same (large) codebase. We used imperative OCaml to build the low-level functions to interface with the Xen hypervisor, purely functional code to build and test complex algorithms such as the bin-packing scheduler for VM placement, and object-oriented code for services such as logging.
All of these exist within the comforting confines of the ML type system, and you can use all or just one of them as your problem dictates. OCaml is an incredibly pragmatic language that lets you write quick'n'dirty code when you need to (much like a type-safe C), and use the module system to refactor it rapidly when you need to.
InfoQ: Seeing that OCaml is a GC'd language: how suitable is it for low latency applications? Are there ways to keep data structures out of sight of the GC or do bulk allocation of objects (besides encoding them inside primitive arrays)?
Yaron: The first thing to understand about OCaml is that allocation, in particular short-lived allocations, are much cheaper than you're used to from languages like C# and Java --- allocating a block on the minor heap requires only three instructions, and collection of those objects is very inexpensive if they don't survive to move to the major heap. So the need to allocate lots of short-lived objects is less of a concern than you'd naively imagine.
That said, if you want to build very high-throughput applications that can process millions of transactions a second, you to need to avoid allocation, particularly long-lived allocations that end up on the major heap. Jane Street's Core library (which is used throughout Real World OCaml) provides tools for creating pooled objects for just this kind of use-case.
Anil: The OCaml GC is remarkably predictable, and is one of the main reasons we use it to build systems infrastructure. One such project is the Mirage operating system (openmirage.org), which is an entire "library operating system" written in pure OCaml, from the device drivers, to the TCP/IP stack, to the filesystem logic and applications themselves. We compile all this code into a specialized tiny kernel that runs directly on the Xen hypervisor without requiring a full OS stack. Pulling this off while keeping a high performance metric required a certain amount of precision in the fast paths, and OCaml has been fantastic to work with. The GC has been good enough to sustain gigabits of throughput on the pure OCaml TCP/IP stack. I actually built a DNS and SSH server in pure OCaml and evaluated its throughput and latencies vs BIND and OpenSSH in 2007, in this Eurosys 2007 paper.
InfoQ's coverage of the Mirage OS 1.0 release provides more details about the project.
InfoQ: OCaml has a GIL (global lock) which means only one OCaml thread can be active at any time. How big of an issue is this for OCaml users, or are there ways to work around that? Any common libs to support multi-processing with OCaml?
In OCaml, the primary approach to building parallel systems is by using message passing. Libraries like Async (covered in Real World OCaml) and Lwt that make it easy to build concurrent programs are a critical building block for making larger scale parallel programs. One advantage of using message passing is that your applications can scale to data-center scales, rather than only scaling within a single computer.
There are various libraries that help you build parallel programs, from Jane Street's Async_parallel, to Lwt's Release module, to the Parmap combinator library. There are also extensions such as JoCaml that provide richer type system extensions for distributed message passing parallelism.
InfoQ: Are there unique features of OCaml that make it stand out?
The most distinctive aspect of OCaml is its static type system. If you're used to languages like Java, you'll be surprised at how effective it is at catching bugs at the earliest stage of development. At the same time, the type system is very lightweight, with code that performs like a compiled language but is as terse as a scripting language like Python or Ruby.
OCaml's `match` statement is a particularly effective tool, providing a form of data-structure-driven case analyses, where the compiler provides compile-time guarantees that you haven't missed any cases (see here). It's also highly efficient and concise (see here).
InfoQ: Many developers prefer JVM based languages because they get access to lots of libraries. What's the OCaml ecosystem like?
The OCaml ecosystem has made huge steps forward in the last 2-3 years. The biggest improvement is the arrival of OPAM, a sophisticated package manager for OCaml that lets you install packages with complex dependencies with a minimum of fuss. OPAM also lets you try out variants of the compiler very easily, making it easier for the compiler devs to get feedback on their work.
Of course, Java's ecosystem is way larger than OCaml's, and there's no denying the value of that. But OCaml's tools and libraries are very good and improving rapidly, and they're built for the modern distributed workflow that many of us are now familiar with due to GitHub and Bitbucket.
InfoQ: How often does the word "monad" occur in the book? Are monads a popular topic in the OCaml space or does the OCaml community have different solutions for the problems monads in Haskell solve, ie like the F# community? Bonus question: what's your favorite monad?
Monads are important in OCaml, but it's different from a language like Haskell where all imperative programming must be done with a monad. In Haskell, any program that interacts with the outside world (i.e., any useful program) needs to use monads. Because monads aren't needed for imperative programming, monads tend to be more of an advanced topic in OCaml. That said, they're very useful.
My favorite monad is no doubt Async, a library for concurrent programming where a monad is used to represent a computation that may block for a non-deterministic period of time. We use the Async monad to build an HTTP client for the DuckDuckGo search engine in Real World OCaml, including error handling.
InfoQ: On the topic of meta programming: in the book you mention syntax extensions - are they like macros or compiler plugins in other languages? What's the code representation they work with?
OCaml's meta-programming story is one of the more unique aspects of the language, and is currently under active development. For many years now, we've had `camlp4`, which is a very powerful tool that can be used to arbitrarily change the surface syntax of the language. It operates as a distinct front-end for the compiler, converting source code into OCaml ASTs for the rest of the toolchain to consume. Camlp4 permits dynamically loadable extensions to the language grammar, which can range from adding keywords to entirely new domain-specific languages being embedded into OCaml source code. `camlp4` has been a huge success, leading to a wide variety of important syntax extensions.
The most widely used ones are those that auto-generate functionality for new types, like sexplib, which for generates converters to and from s-expressions, or pa_compare, which generates efficient type-specific comparison functions. The COW (Caml on the Web) syntax extension permits XML, HTML, CSS and JSON to be written directly inside OCaml code.
The next version of OCaml (4.02) has also added "extension points" to the core language to make it easier to integrate this functionality with external IDE tools. Extension points develop a single syntax that can accommodate the vast majority of existing extensions. Once this is done, syntax extensions can be implemented as simple AST to AST transformers, with no special parser or AST required. This resulting code will allow for better IDE-like tooling from projects such as Merlin, which had rapidly become the magic IDE tool of choice for many OCaml developers. It adds name completion, Visual Studio-style "intellisense", build integration and many other features into Vim and Emacs, with support for other IDEs under developent.
InfoQ: Where does the OCaml community hang out, are there common sites or resources that everyone OCaml newbie must know about?
A good place to get your bearings on the OCaml community is the OCaml.org website. The OCaml mailing lists list a number of e-mail groups, with the main list being the oldest and best forum for the language. If you're just starting out, you might prefer the beginners list.
There are also events where you can meet other OCaml folk in the flesh, including the yearly OCaml Users and Developers Workshop, the periodic OCaml hacking sessions at OCaml Labs in Cambridge, The OCaml Meetups in Paris and New York as well as the many other meetings.
There's also an OCaml Planet blog aggregator that is useful to follow if you want to track what many OCaml developers are working on. The IRC channel on FreeNode is also rather popular, and popping in there for quick questions often gets a good response.
About the Book Authors
Anil Madhavapeddy is a Senior Research Fellow at the University of Cambridge, based in the Systems Research Group. He was on the original team that developed the Xen hypervisor, and helped develop an industry-leading cloud management toolstack written entirely in OCaml. Prior to obtaining his PhD in 2006 from the University of Cambridge, Anil had a diverse background in industry at Network Appliance, NASA and Internet Vision. In addition to professional and academic activities, he is an active member of the open-source development community with the OpenBSD operating system, is co-chair of the Commercial Uses of Functional Programming workshop, and an author of the O'Reilly Real World OCaml book.
Yaron Minsky obtained his PhD in Computer Science from Cornell University in 2002 focusing on distributed systems. In 2003, he joined Jane Street where he founded the quantitative research and technology group there.