BT

Lambdas in Java: An In-Depth Analysis

Posted by Alex Blewitt on Jun 03, 2010 |

With the acquisition of Sun Microsystems out of the way, Oracle can get down to the serious business of revitalising what many have come to see as a stagnant language. High on many people's requirements is the ability to be able to pass functions around that are independent of classes, so that functions can be used as arguments to other function or method calls. Functional languages like Haskell and F# exist purely using this paradigm, and functional programming goes back eight decades to the Lambda Calculus, leading to the term Lambda being used in many languages to describe anonymous functions.

Although languages like Python and Ruby have had lambdas for a long time, the recent popularity of JVM-based languages (notably Groovy and Scala) have rekindled demand for lambdas in Java as well. So when it was announced and proposed as an OpenJDK project, many were excited, even if they were confused between lambdas and closures. There is also an InfoQ news post available for those interested in learning more about this discussion (including the debate over the syntax of lambdas in Java).

Lambdas and Closures

The correct name for an anonymous function is always a lambda, or the Greek letter λ. The idea is that a function is purely a result of its arguments; in lambda calculus, the increment function looks like λx.x+1, which corresponds to a python function similar to def inc(x): return x+1 although that is bound to 'inc'. (You can do pure lambdas in Python with lambda x:x+1, and assign the result to a variable or store in another data structure.)

It's also possible to write incomplete functions which rely on external state, rather than being entirely defined by their arguments. For example, λx. x+y is an open function; it can't be evaluated until y is known. The act of capturing local (lexical) state, such as λy. λx. x+y brings closure to the function; all variables are now captured and so is referred to as a 'closed' term. (The name closure refers to this act of making an open term closed.) So whilst all anonymous functions are lambdas, they're not always closures.

In fact, Java has had closures (in the form of inner classes) since Java 1.1. Consider the following code snippet:

public interface IFilter {
  public boolean filter(int x);
}
public class FilterFactory {
  public static IFilter greaterThan(final int i) {
    return new IFilter() {
      public boolean filter(int x) {
        // i is captured from outside lexical scope
        return x > i;
      }
    };
  }
}

In the code sample above, the FilterFactory has a factory method greaterThan, that when invoked, will return a closure over the argument. Call this same piece of code with different arguments, and you end up with different closures, each of which captures its own value. The same is true if we have a lambda expression λi. λx. x > i as the enclosing lambda (which takes i as its argument) will return a function, that when evaluated, will compare its argument to the value of i at the time of closure. Invoke it twice with 3 and 4, and you get different closures, which are λx. x > 3 and λx. x > 4. So referring to anonymous functions as closures is semantically incorrect; not all lambdas are closures, and not all closures are lambdas. But all anonymous functions are lambdas.

Lambda strawman

The initial proposal was posted in December (based on thoughts on a plane whilst travelling to Devoxx), with a more formal 0.1 in January, and a follow-up 0.1.5 in February. Since then, the mailing list has been quiet until a recent flurry of activity in May sparked off another discussion and a new proposal and translation document. Work appears to be going on behind the scenes; though the mailing list is mostly externally interested parties – all the work is being done by Oracle behind closed doors. Although language changes go through the JCP, there is no news yet on what changes (if any) there are to be on the JCP itself, and the Harmony issue is still unresolved.

A bigger concern is one of time-line. Back in April, Neil Gafter commented that the delay to the Lambda project would be inconsistent with the (as then) JDK timeline. (Others have observed that the JDK7 was always going to slip.) What we can be certain of now is that Lambda is still a work in progress, and that JDK7 still hasn't reached release candidate status yet. Whether one will still impact the other remains to be seen.

Recently, an initial implementation was made available in OpenJDK, implementing parts of the proposed implementation. Note that the syntax and descriptions in this article are based on the current strawman proposal; when the final specs are published, the article will be updated to reflect that.

Lambdas in Java

Lambdas are related to a new feature in JSR292 called Method Handles, which allow a direct reference to a method syntactically (rather than having to go through several layers of reflection indirection). Apart from anything else, a VM-level concept of a method handle will permit greater optimisations in in-lining of methods, which is the single biggest win that a JIT performs.

The proposed syntax for representing a lambda in Java is using the # character, followed by arguments and then either an expression or a block. This hasn't changed since the initial proposal, so it's fairly likely that it will continue through to the final specification, though of course it may change. (It also follows project coin's decision to use # to denote exotic method names, as previously covered.) For example, the increment lambda above can be represented as either:

inc = #(int x) (x+1); // single expression
inc2 = #(int x) { return x+1; }; // block

Lambdas will also have the ability to capture variables from the surrounding local scope, so it will be possible to write an equivalent of the addition by doing:

int y = ...;
inc = #(int x) (x+y);

This can be generalised to capturing variables from a function to generate a function factory, like:

public #int(int) incrementFactory(int y) {
  return #(int x) (x+y);
}

The type #int(int) is the type of the lambda itself, which means 'takes an int and returns an int' in this case.

To invoke a lambda, ideally we'd like to do:

inc(3);

Unfortunately, that doesn't quite work. The problem is that Java has different namespaces for fields and methods (unlike C++, which shares a common namespace). So whilst in C++ you can't have a field called foo and a method called foo(), in Java that's perfectly possible:

public class FooMeister {
  private int foo;
  public int foo() {
    return foo;
  }
}

What does this have to do with lambdas? Well, the problem occurs when you assign a lambda to an instance field in a class.

public class Incrementer {
  private static #int(int) inc = #(int x) (x+1);
  private static int inc(int x) {
    return x+2;
  }
}

If we invoke Incrementer.inc(3), what value do we get back? If we bind to the lambda, we get 4, whereas if we bind to the method, we get 5. As a result, lambdas cannot use the same syntax for method invocation as it would lead to ambiguity in certain cases.

The current proposal is to use a period and parenthesis to invoke a lambda. So in order to invoke the lambda from above, we would do Incrementer.inc.(3) instead. Note the additional period from the previous example; that's what makes a lambda invocation different from a normal method call. In fact, you don't even need to assign it to a variable; you can do #(int x) (x+1).(3) if you wanted to.

Lambdas and arrays

Having covered the basics of lambdas, there are a few gotchas to be aware of. Consider this code example:

#String(String)[] arrayOfFunction = new #String(String)[10];
Object[] array = arrayOfFunction;
array[0] = #void() {throw new Exception()};

The problem here is that if the first assignment is allowable, then so too is the second (since any reference type can be cast to Object), which then permits us to assign a function type of a different signature to the array. As a result, code compiled to invoke arrayOfFunction[0].("Foo") will fail, since the lambda at array element zero is not capable of accepting a String as an argument.

Unfortunately, this isn't a problem with the particular lambda implementation; rather, it's a weakness in the Java type system which allows anything to be cast to Object (and by extension, array of Objects). However, whilst generics will result in a ClassCastException at runtime, this actually leaks functions with incompatible signatures which aren't represented as part of the array's type.

As a result, arrays-of-lambdas won't be permitted in Java; though genericised List will be permitted.

Shared mutable state and Effectively final

Java's convention for inner classes is to require all variables captured by the inner class with the final keyword. Although this prevents subsequent modification of the variables post-capture (thus simplifying the implementation of inner classes substantially), it forces developers to type additional characters into code when wrapping as an inner class – or letting the IDE do this automatically.

To simplify the use cases to which lambdas are expected to be put, the new concept of effectively final has been created. This permits non-final values to be captured, provider that the compiler can verify that they are in fact unchanged by the body of the method. In effect, this promotes what the IDE can do into the compiler phase.

However, if the captured variables are modifiable, then this introduces a potential for significant additional overhead in the capture of the lambda. As such, if variables are captured by a lambda and are modified by it (or afterwards in the lambda capture scope) then this must be explicitly marked with the shared keyword. In effect, this makes local variables captured by lambdas final by default, but requires an additional keyword to indicate mutability.

public int total(List list) {
  shared int total = 0;
  list.forEach(#void(Integer i) {
    total += i;
  });
  return total;
}

The problem with mutable state is that it requires a different implementation to purely functional approaches; for example, if the lambda is executed on a different thread or escapes the method then it may be necessary to represent the binding in a separate heap-allocated object. For example, this has the same effect (as used for inner classes):

public int total(List list) {
  final int total = new int[1];
  list.forEach(#void(Integer i) {
    total[0] += i;
  });
  return total;
}

The key thing to note is that whilst the former may be clearer in intent, both will result in the construction of a heap object which may outlive the method call. (There's no guarantee that the lambda passed into the forEach isn't stored in an internal variable, which would leak the one-element integer array, for example.)

Shared mutable state may turn out to be an anti-pattern in Java lambdas; in particular, these kind of summation processes are often better with some kind of fold or reduce approach rather than an imperative summation.

In particular, lambdas that used shared mutable state are unlikely to work well in a multi-threaded environment, unless access to the accumulator is guarded with synchronized accessors.

Extending the Collections classes

Although the availability of lambdas in the Java language are independent to their adoption in the Collections classes, it is clearly desirable to be able to perform functional operations over collections classes. (The above examples assume the existence of the not-yet-present forEach method on List.)

However, many clients may have implemented the List interfaces already – so adding a new method to the interface would break backward compatibility with previous generations. To solve this problem, a separate proposal for defender methods has been suggested. This defines a 'default' method for an interface if one isn't present, which binds to a static method on a helper class. For example, we might have:

public interface List {
  ...
  extension public void forEach(#void(Object))
    default Collections.forEach;
}
public class Collections {
  public void forEach(Collection c, #void(object) f) {
    for(i:c)
      f.(i);
  }
}

As a result, any caller invoking List.forEach(#) would then be calling the implementation at Collections.forEach, which would have the additional parameter to take the original target. However, implementations are free to implement this default with a more performant optimisation:

public class ArrayList implements List {
  private Object elements[];
  public void forEach(#void(Object o) f) {
    for(int i=0;i<elements.length;i++)
      f.(elements[i]);
  }
}

There are however concerns; for example, retro-fitting of the Collections classes feel rushed through in order to meet the same time frame as the lambda classes. It would be possible to extend the Java languages to support lambdas without supporting lambda operations on the java.util collections classes, and then make them available as part of a subsequent JDK.

Play it again, SAM

A common pattern for current functional Java implementations is to make extensive use of interfaces with a single access method, or SAM. Many filter operations will have something like a Predicate or Function which are currently represented as a subclass that implements a given interface. For example, one can filter a list of Integers using:

Iterables.filter(list,new Predicate() {
  public boolean apply(Integer i) {
    return(i > 5 );
  }
});

However, the implementation could be simplified if SAM methods on interfaces were allowable:

Iterables.filter(list,#(Integer i) (i > 5));

In addition, this would be more performant, as would not be necessary to create a new instance of the Predicate each time this code was evaluated; and if JSR292 improvements result in the extension of the dynamic methods to support an asSam() call, the Method-to-instance-of-interface could be understood by the JVM to optimise away all object related calls. However, this requires support outside the scope of the lambda group and may not be possible in the timeframe given.

The meaning of this and non-local returns

There is another open question regarding the treatment of this inside a lambda; does it refer to the lambda itself or the enclosing object instance? Since a lambda is a reference type, being able to get hold of it inside would permit a recursive function to be implemented. For example, one might have:

fib = #int(int i) { 
  if(i == 0 || i == 1 ) 
    return 1 
  else
    return this.(i-1) + this.(i-1);
}

Without this, one has to pull out the reference to the variable to which it is being assigned (i.e. the RHS depends on the LHS of an assignment). Conversely, it makes referring to the enclosing object more difficult; one has to prepend the class name, much like inner classes:

public class EnclosingScope {
  #Object() { return EnclosingScope.this; } 
}

It may be the case that this is disallowed inside a lambda in the initial version, though prefixing this with the enclosing scope will be allowed in any case.

The other consideration is the meaning of control-flow operators like return and break. Some languages, like Self, allow non-local returns in blocks, in which the block itself can trigger a return of the enclosing method. Similarly, control flow (looping with break and continue etc.) may trigger events outside the immediate lambda.

At the moment, the semantics of return seems tightly bound to the return of the enclosing lambda, rather than any outer construct. However, exceptions work as expected; an exception thrown from within a lambda will propagate up to the enclosing method, and from there up the method invocation chain. This is similar to the way exceptions are handled in the BGGA proposal.

Conclusion

The lambda draft is progressing, although not in the originally proposed timeframe of the JDK 7 release plan. With the recent addition of defender methods as well, it is unlikely that a full implementation will be ready for this Summer. Whether JDK 7 will ship without lambdas (or without a retro-fitted Collections API) or whether JDK 7 will be delayed to accommodate the lambda extensions remains to be seen.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Errata by Sake .

def inc(x): x+1

should be

def inc(x): return x+1

Re: Errata by Ryan Slobojan

Hi Sakesun,

Thanks for the catch - this has been updated.

Ryan Slobojan
Chief Editor, InfoQ.com

ugly call syntax by Francisco Gómez

I really do not like the current proposal of using a period and parenthesis to invoke a lambda
Incrementer.inc.(3)
#(int x) (x+1).(3)

it would be nicer something like

Incrementer.#inc(3)
#(int x) (x+1)(3)

or

Incrementer.*inc(3)
#(int x) (x+1)(3)

Errata by Ievgen Lukash


public int total(List list) {
final int total = new int[1];
list.forEach(#void(Integer i) {
total[0] += i;
});
return total; // <--- total[0]
}

Errata by Sebastian Kübeck

shouldn't the Fibonacci function be:


fib = #int(int i) {
if(i == 0 || i == 1 )
return 1
else
return this.(i-1) + this.(i-2);
}


see: en.wikipedia.org/wiki/Fibonacci_number

Not Lambda? by Hendry Luk

That syntax doesnt seem to be "lambda" as commonly known in most programming languages. It's merely just anonymous function.
Lambda in most languages allows us to pass a syntax expression that can be evaluated (not executed) as data at runtime, e.g. into Abstract Syntax Tree. The recipient can then use this "data" and do something based on that information. Common example is Linq in C# which turns lambda expressions into sql-script, or web-service package, or REST call.
Anonymous function is different beast since it only pass an executable function, not intrepretable expression. This thing is exactly that.

Re: ugly call syntax by Michael Ahern

That would get confusing (and ambiguous) when calling lambda from a function that returned a lambda. For example if this returned a lambda:
Incrementor.inc()
How could I later invoke the resulting lambda:
Incrementor.#inc()(x)
like that?

In the context of the accessor method the proposed syntax is unambiguous:
Incrementor.inc().(x)

I am not a huge fan of the syntax either by the way, but it seems like it is the best that can be done given the limitations java imposes.

Re: ugly call syntax by Mario Fusco

I agree. The syntax is honestly ugly. I don't understand why somebody could want to deal with this stuff instead of moving toward more modern languages like Scala.

And if you are obliged to develop in java there are some libraries and tools like lambdaj that already allow to do something similar in java 6 with a similar syntax.

Unfortunate use of dot to initiate a call by Kevlin Henney

The reasons given for using a dot to call inc, i.e., inc.(3), rather than the obvious inc(3) appear a little spurious and, as such, surprising given the experience of those involved in the proposal. It seems that the motivation and implications have not been fully reviewed. Hopefully this syntactic noise can be removed before the final version.

One of the uses to which lambdas can be put is to provide what effectively appear to be dynamically bindable methods, e.g., public lambdas. However, the transparency is lost with the noise dot. This is not only a quirk that needs to be learnt and remembered, it is one that potentially restricts the usefulness and usability of lambdas.

What of the problem case outlined? This is not a legitimate problem case, it is simply an example of poor code and a narrow view of what the language already does. Why inconvenience the majority to support a pathological minority? Syntax should favour and simplify the common case. The edge case is one that very (very) few people will ever encounter intentionally or accidentally (most Java developers don't even know that a field can have the same name as a method).

We can look at two points where we could resolve this without twisting the syntax: at the point of definition or at the point of call.

If we allow fields and methods to be overloaded, we already disallow clashes where there is a duplicate overload. Just try defining two methods with the same call signature. Furthermore, try defining a static and a non-static method with the same name and same argument signature. The compiler rejects these. Given that the signature of a lambda field is known at compile time, this rule can be extended to cover potential ambiguities across lambda fields as well. In other words, the example described in the article would be rejected on these grounds.

The other point at which the compiler can reject such ambiguities is at the point of call. Overload two methods so that one takes (int, long) and the other takes (long, int). These are accepted by the compiler, but if the call is made with (0, 0) this is considered ambiguous and rejected. A call that could ambiguously be either to a method or to a lambda field would be rejected at the point of call. A cast could be used to disambiguate the call, but it is likely we would consider such a conundrum an example of questionable coding, just as we would overloading on (int, long) and (long, int).

It is important to find problems and edge cases in any proposal, but the use to which these cases are put needs to be considered. Instead of working around the cases by changing the syntax, work backwards from the desired syntax and determine what would be involved in supporting it rather than rejecting it.

Re: Unfortunate use of dot to initiate a call by Yoav Abrahami

I totally agree. Exception cases should be treated as exception cases, you should not base the syntax on them.

We can easily drop the . syntax to make a lambda call and still have a nice consistent language. In the case of name clashes, it should be considered an exception and issue a compilation error.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

10 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT