According to a study, the most expressive general-purpose languages are Clojure, CoffeeScript and Haskell. The study uses LoC/commit as the measuring unit of expressiveness.
Donnie Berkholz, a RedMonk's resident PhD, has conducted a study meant to quantify the expressiveness of various programming languages. The study is based on data provided by Ohlol, a repository keeping track of over 500,000 open source projects written in about 100 languages spanning around 20 years.
Berkholz used as the expressiveness measuring unit LoC/commit, adding that he started from the assumption that “commits are generally used to add a single conceptual piece”. Also, the results are not a measure of maintainability or productivity, nor telling how readable is the resulting code or how long it takes to write it.
The following graphic shows the expressiveness of over 50 languages which are colored based on their popularity according to RedMonks’s language rankings published earlier this year: red - most popular languages, blue – 2nd tier in popularity, and black – 3rd tier (click to enlarge).
Each language has LoC/commit distributed over a range since the study covers many different projects/language, each with its own average. Languages are ranked by their median - the black line inside the box representing LoC/commit for 50% of the corresponding projects –, the bottom and the top of the box represent 25% and 75% of the projects, while the whiskers go down to 10% and up to 90%.
Some of Berkholz’ conclusions are:
Third-tier languages are heavily biased toward high expressiveness.
Functional languages tend to be highly expressive.
Domain-specific languages are biased toward high expressiveness.
Compilation does not imply lower expressiveness.
CoffeeScript (#6) appears dramatically more expressive than JavaScript (#51), in fact among the best of all languages.
Clojure (#7) is the most expressive of Lisp variants.
Although Go (#24) is getting increasingly hot, it’s not outstandingly expressive. … Despite that, it does trump all the tier-one languages, so someone who only had experience with them could certainly see an improvement when trying Go.
The conclusion that “Third-tier languages are heavily biased toward high expressiveness”makes one wonder why highly expressive languages do not become popular? Does their conciseness make it difficult for the average programmer to grasp and use such languages? Are there other reasons?
Berkholz also ranked languages based on their expressiveness consistency, measured by the height of the box, resulting in the next graphic (click to enlarge):
Berkholz’ conclusions are:
Tier-one languages put in a much stronger showing here.
Tier-one languages tend to be remarkably consistent, regardless of their expressiveness.
This suggests that a primary characteristic of a tier-one language is its predictability, even more so than its productivity.
Tier-three languages make a poorer showing here.
Java turns in the strongest performance of “enterprisey” languages (C, C++, Java).
CoffeeScript is #1 for consistency, with an IQR spread of only 23 LOC/commit compared to even #4 Clojure at 51 LOC/commit.
Based on expressiveness consistency result and Redmonk’s ranking on language popularity, Berkholz’ concludes that Clojure, CoffeeScript and Haskell are most expressive high-purpose languages. His study is partially backed up by another study conducted by David R. MacIver which interviewed 2576 programmers using the Hammer Principle. According to Maclver, the most expressive languages are Haskell, Clojure and Scala while the least expressive are C, PHP and ultimately TCL. Maclver’s study did not include CoffeeScript.
Berkholz post triggered a large number of comments both on his original post, Hacker News, and Twitter, many considering that LoC/commit does not accurately represent the expressiveness of a language, expressiveness should consider code readability and maintainability, DSLs should not be included, and many others.
Berkholz insists that his study is not about language readability and maintainability, but “rather something about the state of the code in the repository, the development practices in use, potentially the level of bugs you're likely to get (given the correlation between bugs and LOC)”, explaining in greater detail in a separate post why he used LoC/commit to measure expressiveness.