BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Lambdas in Java: An In-Depth Analysis

Lambdas in Java: An In-Depth Analysis

This item in japanese

Bookmarks

With the acquisition of Sun Microsystems out of the way, Oracle can get down to the serious business of revitalising what many have come to see as a stagnant language. High on many people's requirements is the ability to be able to pass functions around that are independent of classes, so that functions can be used as arguments to other function or method calls. Functional languages like Haskell and F# exist purely using this paradigm, and functional programming goes back eight decades to the Lambda Calculus, leading to the term Lambda being used in many languages to describe anonymous functions.

Although languages like Python and Ruby have had lambdas for a long time, the recent popularity of JVM-based languages (notably Groovy and Scala) have rekindled demand for lambdas in Java as well. So when it was announced and proposed as an OpenJDK project, many were excited, even if they were confused between lambdas and closures. There is also an InfoQ news post available for those interested in learning more about this discussion (including the debate over the syntax of lambdas in Java).

Lambdas and Closures

The correct name for an anonymous function is always a lambda, or the Greek letter λ. The idea is that a function is purely a result of its arguments; in lambda calculus, the increment function looks like λx.x+1, which corresponds to a python function similar to def inc(x): return x+1 although that is bound to 'inc'. (You can do pure lambdas in Python with lambda x:x+1, and assign the result to a variable or store in another data structure.)

It's also possible to write incomplete functions which rely on external state, rather than being entirely defined by their arguments. For example, λx. x+y is an open function; it can't be evaluated until y is known. The act of capturing local (lexical) state, such as λy. λx. x+y brings closure to the function; all variables are now captured and so is referred to as a 'closed' term. (The name closure refers to this act of making an open term closed.) So whilst all anonymous functions are lambdas, they're not always closures.

In fact, Java has had closures (in the form of inner classes) since Java 1.1. Consider the following code snippet:

public interface IFilter {
  public boolean filter(int x);
}
public class FilterFactory {
  public static IFilter greaterThan(final int i) {
    return new IFilter() {
      public boolean filter(int x) {
        // i is captured from outside lexical scope
        return x > i;
      }
    };
  }
}

In the code sample above, the FilterFactory has a factory method greaterThan, that when invoked, will return a closure over the argument. Call this same piece of code with different arguments, and you end up with different closures, each of which captures its own value. The same is true if we have a lambda expression λi. λx. x > i as the enclosing lambda (which takes i as its argument) will return a function, that when evaluated, will compare its argument to the value of i at the time of closure. Invoke it twice with 3 and 4, and you get different closures, which are λx. x > 3 and λx. x > 4. So referring to anonymous functions as closures is semantically incorrect; not all lambdas are closures, and not all closures are lambdas. But all anonymous functions are lambdas.

Lambda strawman

The initial proposal was posted in December (based on thoughts on a plane whilst travelling to Devoxx), with a more formal 0.1 in January, and a follow-up 0.1.5 in February. Since then, the mailing list has been quiet until a recent flurry of activity in May sparked off another discussion and a new proposal and translation document. Work appears to be going on behind the scenes; though the mailing list is mostly externally interested parties – all the work is being done by Oracle behind closed doors. Although language changes go through the JCP, there is no news yet on what changes (if any) there are to be on the JCP itself, and the Harmony issue is still unresolved.

A bigger concern is one of time-line. Back in April, Neil Gafter commented that the delay to the Lambda project would be inconsistent with the (as then) JDK timeline. (Others have observed that the JDK7 was always going to slip.) What we can be certain of now is that Lambda is still a work in progress, and that JDK7 still hasn't reached release candidate status yet. Whether one will still impact the other remains to be seen.

Recently, an initial implementation was made available in OpenJDK, implementing parts of the proposed implementation. Note that the syntax and descriptions in this article are based on the current strawman proposal; when the final specs are published, the article will be updated to reflect that.

Lambdas in Java

Lambdas are related to a new feature in JSR292 called Method Handles, which allow a direct reference to a method syntactically (rather than having to go through several layers of reflection indirection). Apart from anything else, a VM-level concept of a method handle will permit greater optimisations in in-lining of methods, which is the single biggest win that a JIT performs.

The proposed syntax for representing a lambda in Java is using the # character, followed by arguments and then either an expression or a block. This hasn't changed since the initial proposal, so it's fairly likely that it will continue through to the final specification, though of course it may change. (It also follows project coin's decision to use # to denote exotic method names, as previously covered.) For example, the increment lambda above can be represented as either:

inc = #(int x) (x+1); // single expression
inc2 = #(int x) { return x+1; }; // block

Lambdas will also have the ability to capture variables from the surrounding local scope, so it will be possible to write an equivalent of the addition by doing:

int y = ...;
inc = #(int x) (x+y);

This can be generalised to capturing variables from a function to generate a function factory, like:

public #int(int) incrementFactory(int y) {
  return #(int x) (x+y);
}

The type #int(int) is the type of the lambda itself, which means 'takes an int and returns an int' in this case.

To invoke a lambda, ideally we'd like to do:

inc(3);

Unfortunately, that doesn't quite work. The problem is that Java has different namespaces for fields and methods (unlike C++, which shares a common namespace). So whilst in C++ you can't have a field called foo and a method called foo(), in Java that's perfectly possible:

public class FooMeister {
  private int foo;
  public int foo() {
    return foo;
  }
}

What does this have to do with lambdas? Well, the problem occurs when you assign a lambda to an instance field in a class.

public class Incrementer {
  private static #int(int) inc = #(int x) (x+1);
  private static int inc(int x) {
    return x+2;
  }
}

If we invoke Incrementer.inc(3), what value do we get back? If we bind to the lambda, we get 4, whereas if we bind to the method, we get 5. As a result, lambdas cannot use the same syntax for method invocation as it would lead to ambiguity in certain cases.

The current proposal is to use a period and parenthesis to invoke a lambda. So in order to invoke the lambda from above, we would do Incrementer.inc.(3) instead. Note the additional period from the previous example; that's what makes a lambda invocation different from a normal method call. In fact, you don't even need to assign it to a variable; you can do #(int x) (x+1).(3) if you wanted to.

Lambdas and arrays

Having covered the basics of lambdas, there are a few gotchas to be aware of. Consider this code example:

#String(String)[] arrayOfFunction = new #String(String)[10];
Object[] array = arrayOfFunction;
array[0] = #void() {throw new Exception()};

The problem here is that if the first assignment is allowable, then so too is the second (since any reference type can be cast to Object), which then permits us to assign a function type of a different signature to the array. As a result, code compiled to invoke arrayOfFunction[0].("Foo") will fail, since the lambda at array element zero is not capable of accepting a String as an argument.

Unfortunately, this isn't a problem with the particular lambda implementation; rather, it's a weakness in the Java type system which allows anything to be cast to Object (and by extension, array of Objects). However, whilst generics will result in a ClassCastException at runtime, this actually leaks functions with incompatible signatures which aren't represented as part of the array's type.

As a result, arrays-of-lambdas won't be permitted in Java; though genericised List will be permitted.

Shared mutable state and Effectively final

Java's convention for inner classes is to require all variables captured by the inner class with the final keyword. Although this prevents subsequent modification of the variables post-capture (thus simplifying the implementation of inner classes substantially), it forces developers to type additional characters into code when wrapping as an inner class – or letting the IDE do this automatically.

To simplify the use cases to which lambdas are expected to be put, the new concept of effectively final has been created. This permits non-final values to be captured, provider that the compiler can verify that they are in fact unchanged by the body of the method. In effect, this promotes what the IDE can do into the compiler phase.

However, if the captured variables are modifiable, then this introduces a potential for significant additional overhead in the capture of the lambda. As such, if variables are captured by a lambda and are modified by it (or afterwards in the lambda capture scope) then this must be explicitly marked with the shared keyword. In effect, this makes local variables captured by lambdas final by default, but requires an additional keyword to indicate mutability.

public int total(List list) {
  shared int total = 0;
  list.forEach(#void(Integer i) {
    total += i;
  });
  return total;
}

The problem with mutable state is that it requires a different implementation to purely functional approaches; for example, if the lambda is executed on a different thread or escapes the method then it may be necessary to represent the binding in a separate heap-allocated object. For example, this has the same effect (as used for inner classes):

public int total(List list) {
  final int total = new int[1];
  list.forEach(#void(Integer i) {
    total[0] += i;
  });
  return total;
}

The key thing to note is that whilst the former may be clearer in intent, both will result in the construction of a heap object which may outlive the method call. (There's no guarantee that the lambda passed into the forEach isn't stored in an internal variable, which would leak the one-element integer array, for example.)

Shared mutable state may turn out to be an anti-pattern in Java lambdas; in particular, these kind of summation processes are often better with some kind of fold or reduce approach rather than an imperative summation.

In particular, lambdas that used shared mutable state are unlikely to work well in a multi-threaded environment, unless access to the accumulator is guarded with synchronized accessors.

Extending the Collections classes

Although the availability of lambdas in the Java language are independent to their adoption in the Collections classes, it is clearly desirable to be able to perform functional operations over collections classes. (The above examples assume the existence of the not-yet-present forEach method on List.)

However, many clients may have implemented the List interfaces already – so adding a new method to the interface would break backward compatibility with previous generations. To solve this problem, a separate proposal for defender methods has been suggested. This defines a 'default' method for an interface if one isn't present, which binds to a static method on a helper class. For example, we might have:

public interface List {
  ...
  extension public void forEach(#void(Object))
    default Collections.forEach;
}
public class Collections {
  public void forEach(Collection c, #void(object) f) {
    for(i:c)
      f.(i);
  }
}

As a result, any caller invoking List.forEach(#) would then be calling the implementation at Collections.forEach, which would have the additional parameter to take the original target. However, implementations are free to implement this default with a more performant optimisation:

public class ArrayList implements List {
  private Object elements[];
  public void forEach(#void(Object o) f) {
    for(int i=0;i<elements.length;i++)
      f.(elements[i]);
  }
}

There are however concerns; for example, retro-fitting of the Collections classes feel rushed through in order to meet the same time frame as the lambda classes. It would be possible to extend the Java languages to support lambdas without supporting lambda operations on the java.util collections classes, and then make them available as part of a subsequent JDK.

Play it again, SAM

A common pattern for current functional Java implementations is to make extensive use of interfaces with a single access method, or SAM. Many filter operations will have something like a Predicate or Function which are currently represented as a subclass that implements a given interface. For example, one can filter a list of Integers using:

Iterables.filter(list,new Predicate() {
  public boolean apply(Integer i) {
    return(i > 5 );
  }
});

However, the implementation could be simplified if SAM methods on interfaces were allowable:

Iterables.filter(list,#(Integer i) (i > 5));

In addition, this would be more performant, as would not be necessary to create a new instance of the Predicate each time this code was evaluated; and if JSR292 improvements result in the extension of the dynamic methods to support an asSam() call, the Method-to-instance-of-interface could be understood by the JVM to optimise away all object related calls. However, this requires support outside the scope of the lambda group and may not be possible in the timeframe given.

The meaning of this and non-local returns

There is another open question regarding the treatment of this inside a lambda; does it refer to the lambda itself or the enclosing object instance? Since a lambda is a reference type, being able to get hold of it inside would permit a recursive function to be implemented. For example, one might have:

fib = #int(int i) { 
  if(i == 0 || i == 1 ) 
    return 1 
  else
    return this.(i-1) + this.(i-1);
}

Without this, one has to pull out the reference to the variable to which it is being assigned (i.e. the RHS depends on the LHS of an assignment). Conversely, it makes referring to the enclosing object more difficult; one has to prepend the class name, much like inner classes:

public class EnclosingScope {
  #Object() { return EnclosingScope.this; } 
}

It may be the case that this is disallowed inside a lambda in the initial version, though prefixing this with the enclosing scope will be allowed in any case.

The other consideration is the meaning of control-flow operators like return and break. Some languages, like Self, allow non-local returns in blocks, in which the block itself can trigger a return of the enclosing method. Similarly, control flow (looping with break and continue etc.) may trigger events outside the immediate lambda.

At the moment, the semantics of return seems tightly bound to the return of the enclosing lambda, rather than any outer construct. However, exceptions work as expected; an exception thrown from within a lambda will propagate up to the enclosing method, and from there up the method invocation chain. This is similar to the way exceptions are handled in the BGGA proposal.

Conclusion

The lambda draft is progressing, although not in the originally proposed timeframe of the JDK 7 release plan. With the recent addition of defender methods as well, it is unlikely that a full implementation will be ready for this Summer. Whether JDK 7 will ship without lambdas (or without a retro-fitted Collections API) or whether JDK 7 will be delayed to accommodate the lambda extensions remains to be seen.

Rate this Article

Adoption
Style

BT