Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Enhanced Streams Processing with Kotlin’s Sequence Interface

Enhanced Streams Processing with Kotlin’s Sequence Interface

Key Takeaways

  • Kotlin's Sequence interface can be used as an alternative to Java's Stream interface.
  • Sequence has better performance than Stream as well as popular alternatives, such as Guava, Protonpack and Vavr, when it comes to sequential processing.
  • It's possible to create fluent pipelines with Sequence even in Java.
  • Sequence offers a less verbose way to extend its API.
  • Sequence offers a full suite of operations.

Over the past two years, Kotlin has been the fastest-growing language gaining over 1.1 million developers. It was designed for the JVM and Android, focusing on interoperability, safety and clarity, and was recently appointed by Google as the preferred language for Android development.

With the release of Kotlin 1.4 in August 2020, new features offered improvements mostly focused on quality and performance, including Kotlin’s Sequence interface which may be used as an alternative to Java’s Stream interface, even if you are developing in Java. It provides a full suite of operations and outperforms Stream in a variety of benchmarks for sequential processing.

There are a few well-known third-party libraries that eliminate the limitations of Streams, mainly due to its limited set of operations. Guava, Protonpack,Eclipse Collections, jOOλ, StreamEx or Vavr are some of the most used libraries in the Java ecosystem. Yet, none of them achieve better performance than Sequence when it comes to sequential processing.

Since Kotlin is interoperable with the Java programming language, both may coexist in the same code bases. Yet, if you are unable to use Kotlin or prefer to avoid the mix between dialects and use a single programming language, then you can still use Sequence while developing with Java.

In this article, we present two simple shortcuts that let you embrace all the power of Kotlin Sequence in your Java programs without the need of managing the Kotlin programming language. We devised some benchmarks, inspired by use cases in kotlin-benchmarks and this Stackoverflow question that highlighted the limitations of Stream, in order to better understand the performance difference between Stream and some of its state-of-the-art alternatives, while providing a comparison of their features.

Using Kotlin Sequence in Java

Sequence is an interesting choice due to its easy extensibility and ability to be used in Java. To that end, all you need to do is to add:

  1. A library dependency to kotlin-stdlib-1.4.20.jar (1.5 Mb library)
  2. [Optional] A Java Sequence wrapper such as Sek or a dependency to sek-1.0.1.jar (11.4 Kb library).

This latter point is optional, yet, since Kotlin implements Sequence operations through extension methods, you are unable to use them fluently in Java. Thus, to eliminate that limitation you will need a wrapper that you may include in your code through one of the following choices:

  • Copy paste this wrapper definition:
  • Add a single dependency to the Sek artifact sek-1.0.1.jar (which already depends on kotlin-stdlib).

Sequence provides a full set of operations out of the box (some of which are absent in the Stream API). However, as previously stated, Sequence methods do not translate to Java as instance methods of the Sequence type. Instead, they are translated as static methods that are available as extension methods in Kotlin. Although these methods can be fluently used in Kotlin, they cannot be fluently chained in Java. Sek eliminates this limitation and allows developers to chain Sequence operations fluently into a pipeline. The following code snippet presents some use cases with operations that are absent in Streams, such as filterNot(), distinctBy() or zip():

Sek<String> songs = Sek.of(
        new Song("505", "Alternative"),
        new Song("Amsterdam", "Alternative"),
        new Song("Mural", "Hip-Hop"))
   .filterNot(song ->"A"))

Sek<String> artists = Sek.of(
        new Artist("Arctic Monkeys", "band"),
        new Artist("Nothing But Thieves", "band"),
        new Artist("Lupe Fiasco", "solo-artist"))
   .map(Artist::name);, (song, artist) -> String.format("%s by %s", song, artist))

// Output:
// 505 by Arctic Monkeys
// Mural by Lupe Fiasco

Not only that, but extension methods are also available with the ability to be chained into a pipeline through the use of Sek’s then method. For example, consider the following user-defined operation, i.e. oddIndexes, in a Kotlin class named Extensions.kt:

fun <T> Sequence<T>.oddIndexes() = sequence<T> {
    var isOdd = false
    for (item in this@oddIndexes) {
        if (isOdd) yield(item)
            isOdd = !isOdd

We could use this function with Sek as follows:

Sek.of("a", "b", "c", "d", "f", "e")

// Output:
// b
// d
// e

Without Sek

Using Sequence without using a wrapper like Sek is possible, of course, but it has the drawback of losing fluency. Consider the following pipeline example that we evaluate in our benchmarks (explained in detail ahead), to retrieve a sequence of tracks by country. Using Streams, our pipeline would look like this:

Stream<Pair<Country, Stream<Track>>> getTracks() {
    return getCountries()
        .map(country -> Pair.with(country, getTracksForCountry(country)));

Using Sequence in Java, however, our pipeline would become nested. This is a consequence of the Sequence interface only exposing the iterator() method and adding all other methods as static methods in Java. Here’s how our pipeline would look like using Sequence in a Java method:

public static Sequence<Pair<Country, Sequence<Track>>> getTracks() {
        country -> Pair.with(country, getTracksForCountry(country))

Sek eliminates this limitation by maintaining fluent pipelines when using Sequence operations, resulting in a pipeline with the same shape as presented for Stream.

What is Sek?

Sek is an Interface that extends from Sequence, inheriting the abstract methoditerator(). In other words, to define aSek, we only need to implement the iterator() method. Thus, Sek defines operations provided by Sequence by creating default methods that redirect the call to the corresponding static method in SequencesKt. Not only that, but if the operation is intermediate - we simply return a method reference to the Iterator of the returned Sequence which will be inferred as an implementation of Sek.

public Sek<T> extends Sequence<T> {
    // ...
    static <T> Sek<T> of(T... elements) {
        return SequencesKt.sequenceOf(elements)::iterator;
    // ...
    default T first() {
        return SequencesKt.first(this);
    // ...
    default <R> Sek<Pair<T,R>> zip(Sequence<R> other) {
        return, other)::iterator;
    // ...

The above code snippet shows the concept behind Sek . It provides static methods to create new instances and provides intermediate and terminal operations through default methods, returning directly if the operation is terminal, or a method reference to the iterator() of the returned Sequence otherwise. Notice, for example, the implementation of thezip() method that calls the corresponding zip() method of SequencesKt. This returns a new instance of a class implementing Sequence. In this case, it is an instance of the internal class MergingSequence<T1, T2, V> that does not conform with our Sek interface. Yet, both Sek andSequence interfaces have a single abstract method iterator() with the same signature, which means they are compatible functional interfaces. Thus, converting from a Sequence object to a Sek only requires mapping the iterator method reference, e.g. ...::iterator.

Feature Comparison

In this section, we present an analysis of the different Sequence alternatives regarding the following features:

  1. Verbosity when extending their respective APIs.
  2. Fluency while using user-defined operations.
  3. Dependency of third-parties
  4. Compatibility with Stream.
  5. Performance.

Extending the functionality of Java’s Stream can be quite verbose. In order to define and use a custom operation, the user needs to define a new way of traversing the sequence through a Spliterator and then use the StreamSupport class to instantiate a new Stream . Let’s say we want to add a zip operation to Stream. Then one possible implementation would look like this:

public static <A, B, C> Stream<C> zip(Stream<? extends A> a,
        Stream<? extends B> b,
        BiFunction<? super A, ? super B, ? extends C> zipper) {
    Spliterator<? extends A> aSpliterator = Objects.requireNonNull(a).spliterator();
    Spliterator<? extends B> bSpliterator = Objects.requireNonNull(b).spliterator();

    int characteristics = aSpliterator.characteristics() &
        bSpliterator.characteristics() &
        ~(Spliterator.DISTINCT | Spliterator.SORTED);

    long zipSize = ((characteristics & Spliterator.SIZED) != 0) ?
        bSpliterator.getExactSizeIfKnown()) : -1;

    Iterator<A> aIterator = Spliterators.iterator(aSpliterator);
    Iterator<B> bIterator = Spliterators.iterator(bSpliterator);
    Iterator<C> cIterator = new Iterator<C>() {
        public boolean hasNext() {
            return aIterator.hasNext() && bIterator.hasNext();

        public C next() {
            return zipper.apply(,;

    Spliterator<C> split = Spliterators.spliterator(cIterator,
            zipSize, characteristics);
        return (a.isParallel() || b.isParallel()) ?
  , true) :
  , false);

This implementation is 29 lines long to define one operation. Not only that, but we can’t chain this operation into our pipelines, effectively trading extensibility for fluency. Looking at a concrete use case, we can go back to our earlier example where we had a sequence of songs and zipped it to a sequence of their respective artist. But this time, we’ll use Stream with our newly defined zip() operation. For the sake of this example, let’s imagine that we defined filterNot() and distinctBy() operations using the same method described above. Here’s how it would look:

Stream<String> songs = Stream.of(
    new Song("505", "Alternative"), 
    new Song("Amsterdam", "Alternative"), 
    new Song("Mural", "Hip-Hop"));
Stream<String> artists = Stream.of(
    new Artist("Arctic Monkeys", "band"),
    new Artist("Nothing But Thieves", "band"), 
    new Artist("Lupe Fiasco", "solo-artist"));

    filterNot(songs, song ->"A"))
    distinctBy(artists, Artist::type)
    (song, artist) -> String.format("%s by %s", song, artist)

Each time we use a custom operation, we have to nest our pipelines within the operation, losing fluency in the process. To address these drawbacks, we can add a third-party library such as Sek that provides all required operations. Considering the previous songs and artists are of type Sek,we could rewrite the previous pipeline as:

    .filterNot(song ->"A"))
    .map(pair -> String.format("%s by %s", pair.first, pair.second)

But this approach may have drawbacks or may not be a viable option. Another concern of adding third-party libraries is the ability to easily switch from a third-party sequence type to Stream if the use case deems necessary.

Lastly, we discuss performance with nine benchmark results that demonstrate the speedup relative to Stream for five different query pipelines explained in detail ahead.



Stream ext.

Kotlin Sequence


Eclipse Collections







3r Party Free








With Sek Wrapper




Stream Compatible




Through StreamSupport





































Comparing these approaches, Stream extensibility , jOOλ and Eclipse Collections are all verbose when it comes to extending their respective APIs. Sequence provides an easy way of extensibility by utilizing the yield keyword in conjunction with extension methods in Kotlin. Jayield utilizes it’sthen() method and Vavr provides a recursive way of extension with the help of the headOption(), tail() and prepend() methods by recursively redefining the sequence. Sequence and Jayield are also the only options here to provide a fluent way of using the methods added to their APIs. Stream extensibility, jOOλ, Vavr and Eclipse Collections break the pipeline fluency and become nested when using user-defined operations. Looking at the performance, Sequence is the most performant option in most use cases, having in some cases more than three times the performance of Streams. Jayield is the only alternative that never falls behind Streams in all the use cases benchmarked, and Vavr is the least performant of the alternatives tested.


Our benchmarks were inspired by a few real-world use cases:

  1. Custom Operations - Every and Find
  2. Public Databases -REST Countries and
  3. Pipelines with custom operations.

We discuss each of these use cases in detail. All of these benchmarks (and others) are available in this GitHub repository.

Custom Operations - Every and Find

The custom operations, Every and Find , were based on this Stackoverflow question on how to zip Streams in Java. The question also discussed how the lack of a zip operation in Stream was significant. Our benchmark leveraged some ideas from kotlin-benchmarks .

Every is an operation that, based on a user-defined predicate, tests if all the elements of a sequence match between corresponding positions. This following code snippet shows how the first call to every() returns true as all the strings match, while the second returns false as, even though both Streams have the same elements, the order of the elements is not the same:

Stream seq1 = Stream.of("Nightcall", "Thunderstruck", "One");
Stream seq2 = Stream.of("Nightcall", "Thunderstruck", "One");
Stream seq3 = Stream.of("One", "Thunderstruck", "Nightcall");
BiPredicate pred = (s1, s2) -> s1.equals(s2);

every(seq1, seq2, pred); // returns true
every(seq1, seq3, pred); // returns false

For every() to return true, every element of each sequence must match in the same index. To add theevery() operation we simply combine the zip() and allMatch() operations in sequence, such as:,pred::test).allMatch(Boolean.TRUE::equals);

Find is an operation between two sequences that, based on a user-defined predicate, finds two elements that match, returning one of them in the process. In this next code snippet, the first call to find() returns “Thunderstruck” as it is the first element of the two input Stream s that match the predicate in the same index, the second call returnsnull as not match is made and the third call returns “Toxicity” as expected.

Stream seq1 = Stream.of("Nightcall", "Thunderstruck", "One");
Stream seq2 = Stream.of("Du hast", "Thunderstruck", "Toxicity");
Stream seq3 = Stream.of("Thunderstruck", "One", "Toxicity");
BiPredicate pred = (s1, s2) -> s1.equals(s2);

find(seq1, seq2, pred); // returns "Thunderstruck"
find(seq1, seq3, pred); // returns null
find(seq2, seq3, pred); // returns "Toxicity"

For find() to return an element, two elements of each sequence must match in the same index. Adding the find() operation, therefore, consists of using the zip() method to return an element if a match is made or null otherwise and finally returning the first match made, or null if none is found, like so:, (t1, t2) -> predicate.test(t1, t2) ? t1 : null)

Every and Find are very similar to each other, in the sense that both operations zip() sequences using a predicate. If Find matches on the last element, it runs through the entire sequences as Every would. For this reason, we decided that benchmarking the find() operation, matching it only in the last element, would not add much value to this analysis. Instead, we devised a benchmark in which the match index would vary from the first index up until the last and analysed sequences with only 1000 elements. On the other hand, we have benchmarked the every() operation with sequences of 100,000 elements.

Public Databases - REST Countries and

To benchmark use cases with real-world data, we resorted to publicly available Web APIs, namely REST Countries and We retrieved from REST Countries a list of 250 countries and then used them to query, retrieving both the top Artists and the top Tracks by country, resulting in a total of 7500 records each.

The domain model for these benchmarks can be summarized by the following definition of these classes: Country,Language, Track, and Artist.

We devised two benchmarks using this data, “Distinct Top Artist and Top Track by Country” identified in the table above as “Distinct”, and “Artists Who Are in A Country’s Top Ten Who Also Have Tracks in The Same Country’s Top Ten” identified as “Filter”.

Both these benchmarks start off the same way. We first query all the countries, filter the non-English speaking countries and, from these, we retrieve two sequences: one pairing Country with it’s top Tracks and another pairing Country with it’s top Artists. The following code snippet shows the methods used for the retrieval of both these sequences:

Sequence<Pair<Country, Sequence<Track>>> getTracks() {
    return getCountries()
        .map(country -> Pair.with(country, getTracksForCountry(country)));

Sequence<Pair<Country, Sequence<Artist>>> getArtists() {
    return getCountries()
        .map(country -> Pair.with(country, getArtistsForCountry(country)));

From here on out these two benchmarks diverge.

Distinct - Distinct Top Artist and Top Track by Country

This benchmark uses the zip() method on the sequences retrieved from the methods described above into a Trio instance with the Country, it’s top Artist and it’s top Track, then selecting the distinct entries by Artist.

Sequence<Trio<Country,Artist,Track>> zipTopArtistAndTrackByCountry() {
    return getArtists()
        .map(pair -> Trio.with(

Filter - Artists Who Are in A Country’s Top Ten Who Also Have Tracks in The Same Country’s Top Ten

Not unlike the previous benchmark, this benchmark also uses thezip() method on both sequences, but this time, for each Country object, it takes the top ten artists and top ten Track artist’s names combining them into an instance of Trio. After, from the top ten artists, we filter those that also have tracks in the top ten of that same country, returning a Pair object with the country and the remaining artists.

Sequence<Pair<Country,Sequence<Artist>>> artistsInTopTenWithTopTenTracksByCountry() {
    return getArtists()
            .map(pair -> Trio.with(
            .map(trio -> Pair.with(
                        .filter(artist -> trio.tracksArtists.contains(

Pipelines with Custom Operations

Lastly, we benchmarked interleaving user-defined operations with the ones already in each library. To do so, we used data from WorldWeatherOnline. For these benchmarks, we created two custom operations: oddLines and collapse. Both of these operations are quite simple: The oddLines() method simply lets through elements in odd indexes of the sequence to which it is applied, while the collapse() method coalesces adjacent equal elements into one as the following code snippet exemplifies:

Sequence<String> days = Sequence.of(

Sequence<String> temperatures = Sequence.of( "22ºC", "23ºC", "23ºC", "24ºC", "22ºC", "22ºC", "21ºC" );


// Output
// "2020-05-09"
// "2020-05-11"


// Output
// "22ºC", "23ºC", "24ºC", "22ºC", "21ºC"

We then queried WorldWeatherOnline for the weather in Lisbon, Portugal between the dates of 2020-05-08 and 2020-11-08, providing us with a CSV file that we manipulated with the operations above in a benchmark to query the number of temperature transitions.

Collapse - Query number of temperature transitions

In this benchmark, we manipulate the data set in order to count the number of temperature transitions. To do that, we first filter all the comments from the CSV file, then skip one line that has “Not available” written in it. Then we use the oddLines() method to let through only the hourly info and then map to the temperature on that line. Finally, we use the collapse() method to coalesce adjacent equal elements into one, leaving us with all the transitions, followed by a call to the count() method to retrieve the number of transitions.

    .filter(s -> s.charAt(0) != '#') // filter comments
    .skip(1)   // skip line: Not available
    .oddLines() // filter hourly info
    .mapToInt(line -> parseInt(line.substring(14, 16))) // map to Temperature data

Performance Evaluation

To avoid I/O operations during benchmark execution, we have previously collected all data into resource files, loading all that data into in-memory data structures on benchmark bootstrap. Thus, we avoid any I/O by providing the sequences sources from memory.

To compare the performance of the multiple approaches described above, we benchmarked these queries with jmh , a Java harness for building, running, and analysing benchmarks that target the JVM. We ran them using these GitHub actions and on our local machine which has the following specs:

Microsoft Windows 10 Home
10.0.18363 N/A Build 18363
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, 2801 Mhz, 4 Core(s), 8 Logical Processor(s)
openjdk 15.0.1 2020-10-20
OpenJDK Runtime Environment (build 15.0.1+9-18)
OpenJDK 64-Bit Server VM (build 15.0.1+9-18, mixed mode, sharing)

We also used the following options when we ran these benchmarks with jmh:

-i 4 -wi 4 -f 1 -r 2 -w 2 -tu s --jvmArgs "-Xms6G -Xmx6G"

Which correspond to the following configuration:

  • -i 4 - run 4 measurement iterations.
  • -wi 4 - run 4 warmup iterations.
  • -f 1 - fork each benchmark once.
  • -r 2 - spend at least 2 seconds at each measurement iteration.
  • -w 2 - spend at least 2 seconds at each warmup iteration.
  • -tu s - set the benchmark time unit to seconds
  • --jvmArgs "-Xms6G -Xmx6G" - set the initial and the maximum Java heap size to 6 Gb.
You may check the output log of Github actions execution here .

Observing the results presented in these charts, Sequence, Sek, Jayield and Eclipse Collections all outperform Stream in most use cases. On the other end of the spectrum we have jOOλ and Vavr, both of which fell short performance-wise in most, if not all, benchmarks, when compared toStream. We can also observe that adding a wrapper to Sequence did not hinder its performance as the performance of Sek is on par with it.

From our investigation about Kotlin’s performance over Stream, we identified a few advantages that Kotlin has, namely operations that would returnOptional and nullable in Kotlin. In other words, Kotlin doesn’t create an additional wrapper which results in less overhead. Not only that, Kotlin’s terminal operations are inlined creating one less lambda and removing indirection.

Jayield’s performance gains come from the fast-path iteration protocol that has less overhead when bulk traversing a sequence than Stream does.

Eclipse Collections has a lot of optimizations in place regarding the data-source of the pipeline, namely if an array was at the source, then iteration will be as fast as using a for loop, on the other hand it performs worse in short-circuiting operations due to it processing every operation in bulk, meaning that for a pipeline consisting ofsource.filter(...).findFirst() , Eclipse Collections will first calculate the result of source.filter(...) and then access the first element of the resulting sequence. This may lead to a lot of unnecessary processing.

Source Code and Benchmark results

For the actual implementations of these queries and benchmark runs you can always check our GitHub repository as well as the Java wrapper of Sequence repository Sek.


Java's stream is a masterpiece of software engineering, allowing querying and traversal of sequences, sequentially or in parallel. Nevertheless, it does not contain, nor could it, every single operation that will be needed for every use case.

In this article, we argue that Sequence could very well be an alternative to developers that need a more complete suite of operations, easy extensibility and a performance boost at the cost of adding a dependency to Kotlin and giving up the sequence partitioning for automatic parallel processing that Stream s provide. However, not every developer or team will be willing to learn Kotlin let alone migrate their Java projects to this language. This is where Sek comes in handy, providing the full suite of operations that Sequence provides without leaving the Java ecosystem, maintaining pipeline fluency, and still having a boost in performance over Stream sequential processing for most use cases.

As we previously stated, these benchmarks are available on this GitHub repository and the results are publicly visible in GitHub actions . Also, you may find two ready-to-use repositories showing both Sek usage approaches proposed in this article: sek-usage and sek-usage-lib

About the Authors

Diogo Poeira is a Software Development Engineer at Infinera in Lisbon, with 4 years of experience developing full-stack applications to supervise and manage telecommunications networks. In 2016 he received his bachelor's degree in Computer Engineering from Polytechnic Institute of Lisbon (IPL). In its final project he has designed and developed a web portal for the Garcia de Orta Hospital that provided a way for pharmacists to assign prescriptions to special needs cases. He is currently finishing his Masters degree at ISEL (Instituto Superior de Engenharia de Lisboa) through his work on sequence traversal optimization and extensibility.

Miguel Gamboa is an Assistant Professor of Computer Engineering degree at Polytechnic Institute of Lisbon (IPL) and a researcher at CCISEL and INESC-ID. He is the author of several open-source libraries, such as HtmlFlow and javasync/RxIo. He started his professional career in 1997 at and later as project manager at Altitude Software and then Quatro SI. He also worked at Madrid and São Paulo (Brazil) offices. In 2014 received his Ph.D. degree for its work (STMs for large-scale programs) that aims to provide an efficient alternative to shared-memory synchronization in modern runtime environments (such as, JVM and .Net). He received the BEST Paper Award of ICA3PP, 2013, for its PhD work and was granted with the excellence award by IPL in 2019 for the HtmlFlow project.


Rate this Article