Please don’t rely on this Gitea instance being around forever.
If any of your build scripts use my (kageru’s) projects hosted here, check my Github or IEW on Github for encoding projects. If you can’t find what you’re looking for there, tell me to migrate it.
and even then, only the elements that are consumed are computed.[^inf]
This makes it possible to operate on huge data sets while keeping memory usage low,
because only the currently active element has to be held in memory.
[^inf]: This is what makes infinite iterators possible.
They can be very useful and are used a lot in functional languages,
but they’re not today’s topic.
## Cold, hard numbers
We’ll use that small example from the last section as our first example:
take a range of numbers, double each number, and compute the sum –
except this time, we’ll do the numbers from 1 to 1 billion.
Since everything we’re doing is lazy, memory usage shouldn’t be an issue.
I will use different implementations to solve them and benchmark all of them.
Here are the different approaches I came up with:
- a simple for loop in Java
- Java’s LongStream
- a for each loop with a range in Kotlin
- Java’s LongStream called from Kotlin[^ktjava]
- Java’s Stream wrapped in a Kotlin Sequence
- a Kotlin range wrapped in a Sequence
- Kotlin’s Sequence with a generator to create the range
[^ktjava]: Mainly to make sure there is no performance difference between the two.
The benchmarks were executed on an Intel Xeon E3-1271 v3 with 32 GB of RAM,
running Arch Linux with kernel 5.4.20-1-lts,
using the (at the time of writing) latest OpenJDK preview build (`15-ea+17-717`),
Kotlin 1.4-M1, and [jmh](https://openjdk.java.net/projects/code-tools/jmh/) version 1.23.
The bytecode target was set to Java 15 for the Java code and Java 13 for Kotlin
(newer versions are currently unsupported).
Source code for the Java tests:
```java
public long stream() {
return LongStream.range(1, upper)
.map(l -> l * 2)
.sum();
}
public long loop() {
long sum = 0;
for (long i = 0; i <upper;i++){
sum += i * 2;
}
return sum;
}
```
and for Kotlin:
```kotlin
fun stream() =
LongStream.range(1, upper)
.map { it * 2 }
.sum()
fun loop(): Long {
var sum = 0L
for (l in 1L until upper) {
sum += l * 2
}
return sum
}
fun streamWrappedInSequence() =
LongStream.range(1L, upper)
.asSequence()
.map { it * 2 }
.sum()
fun sequence() =
(1 until upper).asSequence()
.map { it * 2 }
.sum()
fun withGenerator() =
generateSequence(0L, { it + 1L })
.take(upper.toInt())
.map { it * 2 }
.sum()
```
with `const val upper = 1_000_000_000L`.[^`1 until upper` is used in these examples because unlike `lower..upper`, `until` is end-inclusive like Java’s LongStream.range().]
Without wasting any more of your time, here are the results:
Unsurprisingly, using Streams from Java and Kotlin is almost identical in terms of performance.
The same is true for imperative loops,
meaning Kotlin ranges introduce no overhead compared to incrementing for loops.
More surprisingly, using Sequences is an order of magnitude slower.
That was not at all according to my expectations, so I investigated.
As it turns out, Java’s `LongStream` exists because Stream<Long> is *much* slower.
The JVM has to use `Long` rather than `long` when the type is used for generics,
which involves an additional boxing step and the allocation for the `Long` object.[^primitives]
Still, we now know that Streams have about 25% overhead compared to the simple loop for this example,
that generating sequences is a comparatively slow process,
and that wrapping Streams comes at a considerable cost (compared to a sequence created from a range).
[^primitives]: The JVM has a few primitive types, such as `int`, `char`, or array types.
They are different from any other type because they cannot be `null`.
Every regular type on the JVM extends `java.lang.Object` and is just a reference that is being passed around.
The primitives are values, not references, so there’s a lot less overhead involved.
Unfortunately, primitives can’t be used as generic types,
so a list of longs will always convert the `long` to `Long` before adding it.
That last point seemed odd, so I attached a profiler to see where the CPU time is lost.
![Flamegraph of `streamWrappedInSequence()`](https://i.kageru.moe/knT2Eg.png)
We can see that the `LongStream` can produce a `PrimitiveIterator.OfLong` that is used as a source for the Sequence.
The operation of boxing a primitive `long` into an object `Long`
(that’s the `Long.valueOf()` step) takes almost as long as advancing the underlying iterator itself.
7.7% of the CPU time is spent in `Sequence.hasNext()`.
The exact breakdown of that looks as follows:
![Checking if a Sequence has more elements](https://i.kageru.moe/k4NHhR.png)
The Sequence introduces very little overhead here, as it just delegates to `hasNext()` of the underlying iterator.
Worth noting is that the iterator calls `accept` as part of `hasNext()`,
which will already advance the underlying iterator.
The value returned by that will be stored temporarily until `nextLong()` is called.
```java
public boolean tryAdvance(LongConsumer consumer) {
final long i = from;
if (i <upTo){
from++;
consumer.accept(i);
return true;
}
// more stuff down here
}
```
where `consumer.accept()` is
```java
public void accept(T t) {
valueReady = true;
nextElement = t;
}
```
Knowing this, I have to wonder why `nextLong()` takes as long as it does.
Looking at [the implementation](https://github.com/openjdk/jdk/blob/6bab0f539fba8fb441697846347597b4a0ade428/src/java.base/share/classes/java/util/Spliterators.java#L756),
I don’t understand where all that time is going.
`hasNext()` should always be called before `next()`,
so `next()` just has to return a precomputed value.
Nevertheless, we can now explain the performance difference with the additional boxing step.
Primitives good; everything else bad.
With that in mind, I wrote a second test that avoids the unboxing issue to compare Streams and Sequences.
The next snippet uses a simple wrapper class that guarantees that we have no primitives
to execute a few operations on a Stream/Sequence.
I’ll use this opportunity to also compare parallel and sequential streams.
The steps are simple:
Take a long -> create a LongWrapper from it, double the contained value (which creates a new LongWrapper), extract the value, compute the sum.
That may sound overcomplicated,
but it’s sadly close to the reality of enterprise code.
Wrapper types are everywhere.
```kotlin
inner class LongWrapper(val value: Long) {
fun double() = LongWrapper(value * 2)
}
fun sequence(): Long =
(1 until upper).asSequence()
.map(::LongWrapper)
.map(LongWrapper::double)
.map(LongWrapper::value)
.sum()
fun stream(): Optional<Long> =
StreamSupport.stream((1 until upper).spliterator(), false)
.map(::LongWrapper)
.map(LongWrapper::double)
.map(LongWrapper::value)
.reduce(Long::plus)
fun parallelStream(): Optional<Long> =
StreamSupport.stream((1 until upper).spliterator(), true)
Full results are in the (JMH log from earlier)[https://ruru.moe/pSK13p8].
The overhead of Java streams is much higher than that of Kotlin Sequences,
and even a parallel Stream is slower than using a Sequence.
even though Sequences only use a single thread,
but both are miles behind the simple for loop.
My first assumption was that the compiler optimized away the wrapper type and just added the longs,
but looking at [the bytecode](https://p.kageru.moe/AUJKiG),
the constructor invocation and the `double()` method calls are still there.
It’s hard to know what the JIT does at runtime,
but the numbers certainly suggest that the wrapper is simply optimized away.
The profiler report wasn’t helpful either,
which further leads me to believe that the JIT just deletes the method and inlines the calculations.
This tells us that not only do Streams/Sequences have a very measurable overhead,
but they severely limit the optimizer’s
(be it compile-time or JIT)
ability to understand the code,
which can lead to significant slowdowns in code that can be optimized.
Obviously, code that doesn’t rely on the optimizer as much won’t be affected to the same degree.
## Conclusion
Overall, I think that Kotlin’s Sequences are a good addition to the language.
They fall behind Streams when working with primitives
because the Java standard library has subtypes for many generic constructs to more efficiently handle primitive types,
but in most real-world JVM applications (that being enterprise-level bloatware),
primitives are the exception rather than the rule.
Still, Kotlin has some types that optimize for this, such as `LongIterator`,
but without a `LongSequence` to go with it,
the boxing will still happen eventually,
and all the performance gains are void.
I hope that we can get a few more types like it in the future,
which will be especially useful once Kotlin/Native reaches maturity
and starts being used for small/embedded hardware.
Apart from the performance, Sequences are also a lot easier to understand and even extend than Streams.
Implementing your own Sequence requires barely more than an implementation of the underlying iterator,
as can be seen in [CoalescingSequence](https://git.kageru.moe/kageru/Sekwences/src/branch/master/src/main/kotlin/moe/kageru/sekwences/CoalescingSequence.kt)
which I implemented last year to get a feeling for how all of this works.
Streams on the other hand are a lot more complex. They actually extend `Consumer<T>`,
so a `Stream<T>` is just a `void consume(T input)` that can be called repeatedly.
That makes it a lot harder to grasp where data is coming from and how it is requested, at least to me.
Simplicity is often underrated in software, but I consider it a huge plus for Sequences.
I will continue to use them liberally,
unless I find myself in a situation where I need to process a huge number of primitives.
And even then, I now know that Java’s Streams are a good choice.
25% might sound like a lot,
but it’s more than worth it if it means leaving code that is much easier to understand and modify for the next person.
Unless you’re actually in a very performance-critical part of your application,