BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles GS Collections by Example – Part 1

GS Collections by Example – Part 1

Leia em Português

This item in japanese

Lire ce contenu en français

I am a Java Developer, Tech Fellow and Managing Director at Goldman Sachs. I am the creator of the GS Collections framework that Goldman Sachs open sourced in January 2012. I am also a former Smalltalk developer.

When I started working with Java, there were two things that I missed.

  1. Smalltalk’s block closures (aka lambdas)
  2. The wonderfully feature rich Smalltalk Collections Framework.

I wanted both of these features as well as compatibility with the existing Java Collections interfaces. Around 2004, I realized that no one was going to give me everything I was looking for in Java. I also knew at this point that I would likely be programming in Java for at least the next 10-15 years of my career. So I decided to start building what I was looking for.

Fast forward 10 years. I now have almost everything I ever wanted in Java. I have support for lambdas in Java 8, and I now get to use lambdas and method references with arguably the most feature rich Java collections framework available – GS Collections.

Here is a comparison of the features available in GS Collections, Java 8, Guava, Trove and Scala. These may not be all of the features you are looking for in a Collections framework, but they are the features that I have needed or other GS developers I work with have needed in Java over the last 10+ years.

Features

GSC 5.0

Java 8

Guava

Trove

Scala

Rich API

 

Interfaces

Readable, Mutable, Immutable, FixedSize, Lazy

Mutable, Stream

Mutable, Fluent

Mutable

Readable, Mutable, Immutable, Lazy

Optimized Set & Map

 (+Bag)

   

 

Immutable Collections

 

 

Primitive Collections

✓ (+Bag, +Immutable)

   

 

Multimaps

✓ (+Bag, +SortedBag)

 

✓ (+Linked)

 

(Multimap trait)

Bags (Multisets)

 

   

BiMaps

 

   

Iteration Styles

Eager/Lazy,
Serial/Parallel
Lazy,
Serial/Parallel
Lazy,
Serial
Eager,
Serial

Eager/Lazy, Serial/Parallel (Lazy Only)

I described the combination of the features that I believe make GS Collections interesting in an interview with jClarity last year. You can read them in their original form here.

Why would you use GS Collections now that Java 8 is out and includes the Streams API? While the Streams API is a big improvement to the Java Collections Framework, it doesn’t have all the features you might want. As shown in the matrix above, GS Collections has multimaps, bags, immutable containers, and primitive containers. GS Collections has optimized replacements for HashSet and HashMap, and its Bags and Multimaps build on those optimized types. The GS Collections iteration patterns are on the collections interfaces so there’s no need to “enter” the API with a call to stream() and “exit” the API with a call to collect(). This results in much more succinct code in many cases. Finally, GS Collections is compatible back to Java 5. This is a particularly important feature for library developers, since they tend to support their library on older versions of Java well after a new major release comes out.

I will show some examples of how you can leverage these features in many different ways. These examples are variants of the exercises in the GS Collections Kata; a training class we use inside of Goldman Sachs to teach our developers how to use GS Collections. We open sourced this training as a separate repository in GitHub.

Example 1: Filtering a collection

One of the most common things you will want to do with GS Collections is to filter a collection. GS Collections provides several different ways to accomplish this.

In the GS Collections Kata, we will often start with a list of customers. In one of the exercises, I want to filter the list of customers down to a list which only includes customers who live in London. The following code shows how I can accomplish this using an iteration pattern named “select”.

import com.gs.collections.api.list.MutableList; 
import com.gs.collections.impl.test.Verify; 

@Test public void getLondonCustomers() { 
      MutableList<Customer> customers = this.company.getCustomers(); 
      MutableList<Customer> londonCustomers = customers.select(c -> c.livesIn("London")); 
      Verify.assertSize("Should be 2 London customers", 2, londonCustomers); 
} 

The select method on MutableList returns a MutableList. This code executes eagerly, meaning all computation to select the matching elements from the source list and add them into the target list has been performed by the time the call to select() completes. The name “select” comes from the Smalltalk heritage. Smalltalk has a basic set of collection protocols named select (aka filter), reject (aka filterNot), collect (aka map, transform), detect (aka findOne), detectIfNone, injectInto (aka foldLeft), anySatisfy and allSatisfy.

If I wanted to accomplish the same thing using lazy evaluation, I would write it as follows:

MutableList<Customer> customers = this.company.getCustomers(); 
LazyIterable<Customer> londonCustomers = customers.asLazy().select(c -> c.livesIn("London")); 
Verify.assertIterableSize(2, londonCustomers); 

Here I have added a call to a method named asLazy(). All of the other code has stayed pretty much the same. The return type of select has changed because of the call to asLazy(). Instead of a MutableList<Customer>, now I get back a LazyIterable<Customer>. This is pretty much the equivalent of the following code using the new Streams API in Java 8:

List<Customer> customers = this.company.getCustomers(); 
Stream<Customer> stream = customers.stream().filter(c -> c.livesIn("London")); 
List<Customer> londonCustomers = stream.collect(Collectors.toList()); 
Verify.assertSize(2, londonCustomers); 

Here the method stream() and then the call to filter() return a Stream<Customer>. In order to test the size, I have to either convert the Stream to a List as above, or I can use the Java 8 Stream.count() method:

List<Customer> customers = this.company.getCustomers(); 
Stream<Customer> stream = customers.stream().filter(c -> c.livesIn("London")); 
Assert.assertEquals(2, stream.count()); 

Both GS Collections interfaces MutableList and LazyIterable share a common ancestor named RichIterable. In fact, I could write all of this code just using RichIterable. Here’s the example using only RichIterable<Customer>, first lazily

RichIterable<Customer> customers = this.company.getCustomers(); 
RichIterable<Customer> londonCustomers = customers.asLazy().select(c -> c.livesIn("London")); 
Verify.assertIterableSize(2, londonCustomers); 

and then again, eagerly

RichIterable<Customer> customers = this.company.getCustomers(); 
RichIterable<Customer> londonCustomers = customers.select(c -> c.livesIn("London")); 
Verify.assertIterableSize(2, londonCustomers); 

As shown in these examples, RichIterable can be used in place of LazyIterable and MutableList, because it is a root interface for both.

It is possible that the list of customers could have been immutable. If I had an ImmutableList<Customer>, this is how the types would have changed:

ImmutableList<Customer> customers = this.company.getCustomers().toImmutable(); 
ImmutableList<Customer> londonCustomers = customers.select(c -> c.livesIn("London"));
Verify.assertIterableSize(2, londonCustomers); 

Like other RichIterables, we can iterate over ImmutableList lazily.

ImmutableList<Customer> customers = this.company.getCustomers().toImmutable(); 
LazyIterable<Customer> londonCustomers = customers.asLazy().select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

There is a common parent interface for both MutableList and ImmutableList named ListIterable. It can be used in place of either type as a more general type. RichIterable is the parent type for ListIterable. So this code can also be more generally written as follows:

ListIterable<Customer> customers = this.company.getCustomers().toImmutable(); 
LazyIterable<Customer> londonCustomers = customers.asLazy().select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

Or even more generally:

RichIterable<Customer> customers = this.company.getCustomers().toImmutable(); 
RichIterable<Customer> londonCustomers = customers.asLazy().select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

The interface hierarchy of GS Collections follows a very basic pattern. For each type (List, Set, Bag, Map), there is a readable interface (ListIterable, SetIterable, Bag, MapIterable), a mutable interface (MutableList, MutableSet, MutableBag, MutableMap), and an immutable interface (ImmutableList, ImmutableSet, ImmutableBag, ImmutableMap).

(Click on the image to enlarge it)

Figure 1. Basic GSC Container Interface Hierarchy

Here is an example of the same code using a Set instead of a List:

MutableSet<Customer> customers = this.company.getCustomers().toSet(); 
MutableSet<Customer> londonCustomers = customers.select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

Here’s a similar solution written lazily with a Set:

MutableSet<Customer> customers = this.company.getCustomers().toSet(); 
LazyIterable<Customer> londonCustomers = customers.asLazy().select(c -> c.livesIn("London"));
Assert.assertEquals(2, londonCustomers.size()); 

Here’s a solution using a Set, using the most general interface possible:

RichIterable<Customer> customers = this.company.getCustomers().toSet(); 
RichIterable<Customer> londonCustomers = customers.asLazy().select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

Next I’ll illustrate the mechanisms that can be used to convert from one container type to another. First, let’s convert from a List to a Set while filtering lazily:

MutableList<Customer> customers = this.company.getCustomers(); 
LazyIterable<Customer> lazyIterable = customers.asLazy().select(c -> c.livesIn("London")); 
MutableSet<Customer> londonCustomers = lazyIterable.toSet(); 
Assert.assertEquals(2, londonCustomers.size()); 

Because the API is very fluent, I can chain all of these methods together:

MutableSet<Customer> londonCustomers = 
       this.company.getCustomers() 
       .asLazy() 
       .select(c -> c.livesIn("London")) 
       .toSet(); 
Assert.assertEquals(2, londonCustomers.size()); 

I will leave it to the reader to decide whether this impacts readability. I tend to break fluent calls up and introduce intermediate types if I feel it will help future readers of the code to understand things better. This comes at the cost of more code to read, but this in turn can lower the cost of understanding, which can be more important for less frequent readers of the code.

I can accomplish the conversion of the List to a Set in the select method itself. The select method has an overloaded form defined which takes a Predicate as the first parameter, and a result collection as a second parameter:

MutableSet<Customer> londonCustomers = 
       this.company.getCustomers() 
       .select(c -> c.livesIn("London"), UnifiedSet.newSet()); 
Assert.assertEquals(2, londonCustomers.size()); 

Notice how I can use this same method to return any Collection type I want. In the following case I get back a MutableBag<Customer>

MutableBag<Customer> londonCustomers = 
       this.company.getCustomers() 
       .select(c -> c.livesIn("London"), HashBag.newBag()); 
Assert.assertEquals(2, londonCustomers.size()); 

In the following case I get back a CopyOnWriteArrayList, which is part of the JDK. The point is that this method will return whatever type I specify, but it has to be a class that implements java.util.Collection:

CopyOnWriteArrayList<Customer> londonCustomers = 
       this.company.getCustomers() 
       .select(c -> c.livesIn("London"), new CopyOnWriteArrayList<>()); 
Assert.assertEquals(2, londonCustomers.size()); 

We have been using a lambda in all of these examples. The method select takes a Predicate which is a functional interface in GS Collections that is defined as follows:

public interface Predicate<T> extends Serializable { 
       boolean accept(T each); 
} 

The lambda I have been using is fairly simple. I will extract it out into a separate variable so it is more clear what this lambda is representing in the code:

Predicate<Customer> predicate = c -> c.livesIn("London"); 
MutableList<Customer> londonCustomers = this.company.getCustomers().select(predicate); 
Assert.assertEquals(2, londonCustomers.size()); 

The method livesIn() which is defined on Customer is fairly simple. It is defined as follows:

public boolean livesIn(String city) { 
       return city.equals(this.city); 
} 

It would be nice if I could use a method reference instead of a lambda here, leveraging the livesIn method. But this code will not compile:

Predicate<Customer> predicate = Customer::livesIn; 

The compiler gives me the following error:

Error:(65, 37) java: incompatible types: invalid method reference 
      incompatible types: com.gs.collections.kata.Customer cannot be converted to java.lang.String 

This is because this method reference would require two parameters, a Customer and the city String. There is an alternate form of Predicate called Predicate2 that will work here.

Predicate2<Customer, String> predicate = Customer::livesIn;

Notice that Predicate2 takes two generic types, Customer and String. There is a special form of select called selectWith that can use this Predicate2.

Predicate2<Customer, String> predicate = Customer::livesIn; 
MutableList<Customer> londonCustomers = this.company.getCustomers().selectWith(predicate, "London"); 
Assert.assertEquals(2, londonCustomers.size()); 

This can be written more simply as follows by inlining the method reference:

MutableList<Customer> londonCustomers = this.company.getCustomers().selectWith(Customer::livesIn, "London"); 
Assert.assertEquals(2, londonCustomers.size());

The string “London” is passed as the second parameter to each call of the method defined on Predicate2. The first parameter will be the Customer from the list.

The method selectWith, like select, is defined on RichIterable. Therefore, everything I have previously demonstrated with select will work with selectWith. This includes support on all the different mutable and immutable interfaces, support for different co-variant types, and support for lazy iteration. There is also a form of selectWith which takes a third parameter. Similar to select with two parameters, the third parameter in selectWith can take a target collection.

For instance, the following code filters from a List to a Set using selectWith:

MutableSet<Customer> londonCustomers = 
       this.company.getCustomers() 
       .selectWith(Customer::livesIn, "London", UnifiedSet.newSet());
Assert.assertEquals(2, londonCustomers.size());

This could also be accomplished lazily with the following code:

MutableSet<Customer> londonCustomers = 
       this.company.getCustomers() 
       .asLazy() 
       .selectWith(Customer::livesIn, "London") 
       .toSet(); 
Assert.assertEquals(2, londonCustomers.size()); 

The last thing I will show is that the select and selectWith methods can be used with any collection that extends java.lang.Iterable. This includes all of the JDK types as well as any third party collection libraries. The first class that ever existed in GS Collections was a utility class named Iterate. Here is a code example that shows how to select from an Iterable using Iterate.

Iterable<Customer> customers = this.company.getCustomers(); 
Collection<Customer> londonCustomers = Iterate.select(customers, c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

The selectWith variation is also available:

Iterable<Customer> customers = this.company.getCustomers(); 
Collection<Customer> londonCustomers = Iterate.selectWith(customers, Customer::livesIn, "London"); 
Assert.assertEquals(2, londonCustomers.size()); 

There are also variations that take target collections. All of the basic iteration protocols are available on Iterate. There is also a utility class that covers lazy iteration (named LazyIterate), and it also can work with any container that extends java.lang.Iterable. For example:

Iterable<Customer> customers = this.company.getCustomers(); 
LazyIterable<Customer> londonCustomers = LazyIterate.select(customers, c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

The nicer way to accomplish this using a more object-oriented API is to use adapter classes. Here is an example using a ListAdapter with a java.util.List:

List<Customer> customers = this.company.getCustomers(); 
MutableList<Customer> londonCustomers = 
       ListAdapter.adapt(customers).select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

This can be written lazily as you might expect by now.

List<Customer> customers = this.company.getCustomers(); 
LazyIterable<Customer> londonCustomers = 
    ListAdapter.adapt(customers) 
    .asLazy() 
    .select(c -> c.livesIn("London"));
Assert.assertEquals(2, londonCustomers.size()); 

The selectWith() method will also work lazily on a ListAdapter:

List<Customer> customers = this.company.getCustomers(); 
LazyIterable<Customer> londonCustomers = 
        ListAdapter.adapt(customers) 
        .asLazy() 
        .selectWith(Customer::livesIn, "London"); 
Assert.assertEquals(2, londonCustomers.size()); 

SetAdapter can be used similarly for any implementation of java.util.Set.

Now if you have the kind of problem that can benefit from data level parallelism, you could use one of two approaches for parallelizing this problem. First I will demonstrate how to use the ParallelIterate class to solve this problem using an eager/parallel algorithm:

Iterable<Customer> customers = this.company.getCustomers(); 
Collection<Customer> londonCustomers = ParallelIterate.select(customers, c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.size()); 

The ParallelIterate class will take any Iterable as a parameter, and always returns java.util.Collection as its result. ParallelIterate has been around in GS Collections since 2005. Eager/parallel has been the only form of parallelism GS Collections has supported until the 5.0 release, when we added a lazy/parallel API to RichIterable. We do not have an eager/parallel API on RichIterable, as we felt that lazy/parallel makes more sense as a default case. We may add an eager/parallel API directly to RichIterable in the future, depending on feedback we receive on the usefulness of the lazy/parallel API.

If I wanted to solve the same problem using the lazy/parallel API, I would write the code as follows:

FastList<Customer> customers = this.company.getCustomers(); 
ParallelIterable<Customer> londonCustomers = 
     customers.asParallel(Executors.newFixedThreadPool(2), 100) 
        .select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.toList().size()); 

Today, the asParallel() method only exists on a few concrete containers in GS Collections. The API has not yet been promoted to any interfaces like MutableList, ListIterable or RichIterable. The asParallel() method takes two arguments – an ExecutorService and a batch size. In the future, we may add a version of asParallel() that calculates the batch size automatically.

I could choose to use a more specific type in this example:

FastList<Customer> customers = this.company.getCustomers(); 
ParallelListIterable<Customer> londonCustomers = 
     customers.asParallel(Executors.newFixedThreadPool(2), 100) 
          .select(c -> c.livesIn("London")); 
Assert.assertEquals(2, londonCustomers.toList().size()); 

There is a hierarchy of ParallelIterable that includes ParallelListIterable, ParallelSetIterable and ParallelBagIterable.

I have demonstrated several different ways of filtering a collection in GS Collections using select() and selectWith(). I have shown you many combinations of eager, lazy, serial and parallel iteration, using different types from the GS Collections RichIterable hierarchy.

In part 2 of this article, to be published next month, I will cover examples including collect, groupBy, flatCollect as well as some primitive containers and the rich API available on them as well. The examples I go through in part 2 will not go into quite as much detail or explore as many options, but it is worth noting that those details and options are likely available.

About the Author

Donald Raab manages the JVM Architecture team, which is part of the Enterprise Platforms group in the Technology division at Goldman Sachs. Raab served as a member of the JSR 335 Expert Group (Lambda Expressions for the Java Programming Language) and is one of the alternate representatives for Goldman Sachs on the JCP Executive Committee. He joined Goldman Sachs in 2001 as a technical architect on the PARA team. He was named technology fellow in 2007 and managing director in 2013.

For more information about GS Collections and Goldman Sachs Engineering visit www.gs.com/engineering.

Disclaimer

This article reflects information available to the Technology Division of Goldman Sachs only and not any other part of Goldman Sachs. It should not be relied upon or considered investment advice. Opinions expressed may not be those of Goldman Sachs unless otherwise expressly noted. Goldman, Sachs & Co. (“GS”) does not warrant or guarantee the accuracy, completeness or efficacy of this article, and recipients should not rely on it except at their own risk. This article may not be forwarded or otherwise disclosed except with this disclaimer intact.

Rate this Article

Adoption
Style

BT