release iterators

This commit is contained in:
kageru 2020-04-12 15:05:46 +02:00
parent ba11ce21df
commit 000a8fcdae
Signed by: kageru
GPG Key ID: 8282A2BEA4ADA3D2

@ -21,7 +21,7 @@ I was personally interested in this because, being a Java/Kotlin developer,
Still, I wanted to know how they compare to imperative code. Still, I wanted to know how they compare to imperative code.
There are some resources on this for Java 8’s Stream API, There are some resources on this for Java 8’s Stream API,
but Kotlin’s Sequences seem to just be accepted as but Kotlin’s Sequences seem to just be accepted as
more convenient Streams[^convenience]. more convenient Streams, without much discussion about their performance.[^convenience]
[^convenience]: If you’ve ever used them, you’ll know what I mean. [^convenience]: If you’ve ever used them, you’ll know what I mean.
Java’s Streams are built in a way that allows for easy parallelism, Java’s Streams are built in a way that allows for easy parallelism,
@ -34,7 +34,7 @@ It lets you write code as a sequence of instructions to be applied to all elemen
Let’s use a simple example to demonstrate this. Let’s use a simple example to demonstrate this.
We want to take all numbers from 1 to 100,000, We want to take all numbers from 1 to 100,000,
multiply each of them by 2, multiply each of them by 2,
and then add all of them.[^sum] and then sum all of them.[^sum]
[^sum]: You could also just compute the sum and take that \* 2, but we specifically want that intermediate step for the example. [^sum]: You could also just compute the sum and take that \* 2, but we specifically want that intermediate step for the example.
@ -56,7 +56,7 @@ return (1..100_000).asSequence()
.sum() .sum()
``` ```
An iterator not a list, and it doesn’t support indexing,[^index] An iterator is not a list, and it doesn’t support indexing,[^index]
because it doesn’t actually contain any data. because it doesn’t actually contain any data.
It just knows how to get or compute it for you, It just knows how to get or compute it for you,
but you don’t know how it does that. but you don’t know how it does that.
@ -65,7 +65,7 @@ An iterator not a list, and it doesn’t support indexing,[^index]
meaning it will produce the numbers from 1 to 100,00 before it ends). meaning it will produce the numbers from 1 to 100,00 before it ends).
You can tell an iterator to produce or emit data if you want to use it You can tell an iterator to produce or emit data if you want to use it
(which is often called ‘consuming’ (which is often called ‘consuming’
because if you read data from the pipeline, because if you read something from the pipeline,
it’s usually gone), it’s usually gone),
or you can add a new step to it and hand the new pipeline to someone else, or you can add a new step to it and hand the new pipeline to someone else,
who can then consume it or add even more steps. who can then consume it or add even more steps.
@ -78,12 +78,12 @@ You can tell an iterator to produce or emit data if you want to use it
An important aspect to note is: An important aspect to note is:
adding an operation to the pipeline does nothing adding an operation to the pipeline does nothing
until someone actually starts reading from it, until someone actually starts reading from it,
and even then, only the elements that are consumed are computed.[^inf] and even then, only the elements that are consumed are computed.
This makes it possible to operate on huge data sets while keeping memory usage low, This makes it possible to operate on huge data sets[^inf] while keeping memory usage low,
because only the currently active element has to be held in memory. because only the currently active element has to be held in memory.
[^inf]: This is what makes infinite iterators possible. [^inf]: Huge or even infinite.
They can be very useful and are used a lot in functional languages, Infinite iterarors can be very useful and are used a lot in functional languages,
but they’re not today’s topic. but they’re not today’s topic.
## Cold, hard numbers ## Cold, hard numbers
@ -179,7 +179,8 @@ Kotlin.streamWrappedInSequence avgt 25 3829.209 ± 33.569 ms/op
Kotlin.withGenerator avgt 25 8374.149 ± 880.647 ms/op Kotlin.withGenerator avgt 25 8374.149 ± 880.647 ms/op
``` ```
(full JMH output)[https://ruru.moe/pSK13p8]. ([full JMH output](https://ruru.moe/pSK13p8))
Unsurprisingly, using Streams from Java and Kotlin is almost identical in terms of performance. Unsurprisingly, using Streams from Java and Kotlin is almost identical in terms of performance.
The same is true for imperative loops, The same is true for imperative loops,
meaning Kotlin ranges introduce no overhead compared to incrementing for loops. meaning Kotlin ranges introduce no overhead compared to incrementing for loops.
@ -187,8 +188,8 @@ The same is true for imperative loops,
More surprisingly, using Sequences is an order of magnitude slower. More surprisingly, using Sequences is an order of magnitude slower.
That was not at all according to my expectations, so I investigated. That was not at all according to my expectations, so I investigated.
As it turns out, Java’s `LongStream` exists because Stream<Long> is *much* slower. As it turns out, Java’s `LongStream` exists because `Stream<Long>` is *much* slower.
The JVM has to use `Long` rather than `long` when the type is used for generics, The JVM has to use `Long` (uppercase) rather than `long` when the type is used for generics,
which involves an additional boxing step and the allocation for the `Long` object.[^primitives] which involves an additional boxing step and the allocation for the `Long` object.[^primitives]
Still, we now know that Streams have about 25% overhead compared to the simple loop for this example, Still, we now know that Streams have about 25% overhead compared to the simple loop for this example,
that generating sequences is a comparatively slow process, that generating sequences is a comparatively slow process,
@ -202,15 +203,19 @@ Still, we now know that Streams have about 25% overhead compared to the simple l
so a list of longs will always convert the `long` to `Long` before adding it. so a list of longs will always convert the `long` to `Long` before adding it.
That last point seemed odd, so I attached a profiler to see where the CPU time is lost. That last point seemed odd, so I attached a profiler to see where the CPU time is lost.
![Flamegraph of `streamWrappedInSequence()`](https://i.kageru.moe/knT2Eg.png) ![Flamegraph of `streamWrappedInSequence()`](https://i.kageru.moe/knT2Eg.png)
We can see that the `LongStream` can produce a `PrimitiveIterator.OfLong` that is used as a source for the Sequence. We can see that the `LongStream` can produce a `PrimitiveIterator.OfLong` that is used as a source for the Sequence.
The operation of boxing a primitive `long` into an object `Long` The operation of boxing a primitive `long` into an object `Long`
(that’s the `Long.valueOf()` step) takes almost as long as advancing the underlying iterator itself. (that’s the `Long.valueOf()` step) takes almost as long as advancing the underlying iterator itself.
7.7% of the CPU time is spent in `Sequence.hasNext()`. 7.7% of the CPU time is spent in `Sequence.hasNext()`.
The exact breakdown of that looks as follows: The exact breakdown of that looks as follows:
![Checking if a Sequence has more elements](https://i.kageru.moe/k4NHhR.png) ![Checking if a Sequence has more elements](https://i.kageru.moe/k4NHhR.png)
The Sequence introduces very little overhead here, as it just delegates to `hasNext()` of the underlying iterator. The Sequence introduces very little overhead here, as it just delegates to `hasNext()` of the underlying iterator.
Worth noting is that the iterator calls `accept` as part of `hasNext()`, Worth noting is that the iterator calls `accept()` as part of `hasNext()`,
which will already advance the underlying iterator. which will already advance the underlying iterator.
The value returned by that will be stored temporarily until `nextLong()` is called. The value returned by that will be stored temporarily until `nextLong()` is called.
@ -250,7 +255,13 @@ The next snippet uses a simple wrapper class that guarantees that we have no pri
I’ll use this opportunity to also compare parallel and sequential streams. I’ll use this opportunity to also compare parallel and sequential streams.
The steps are simple: The steps are simple:
Take a long -> create a LongWrapper from it, double the contained value (which creates a new LongWrapper), extract the value, compute the sum.
1. take a long
1. create a LongWrapper from it
1. double the contained value (which creates a new LongWrapper)
1. extract the value
1. calculate the sum
That may sound overcomplicated, That may sound overcomplicated,
but it’s sadly close to the reality of enterprise code. but it’s sadly close to the reality of enterprise code.
Wrapper types are everywhere. Wrapper types are everywhere.
@ -303,10 +314,10 @@ NonPrimitive.stream avgt 25 44673.318 ± 1325.832 ms/op
NonPrimitive.parallelStream avgt 25 33856.919 ± 249.911 ms/op NonPrimitive.parallelStream avgt 25 33856.919 ± 249.911 ms/op
``` ```
Full results are in the (JMH log from earlier)[https://ruru.moe/pSK13p8]. Full results are in the [JMH log from earlier](https://ruru.moe/pSK13p8).
The overhead of Java streams is much higher than that of Kotlin Sequences, The overhead of Java streams is much higher than that of Kotlin Sequences,
and even a parallel Stream is slower than using a Sequence. and even a parallel Stream is slower than using a Sequence,
even though Sequences only use a single thread, even though Sequences only use a single thread,
but both are miles behind the simple for loop. but both are miles behind the simple for loop.
My first assumption was that the compiler optimized away the wrapper type and just added the longs, My first assumption was that the compiler optimized away the wrapper type and just added the longs,
@ -326,12 +337,13 @@ This tells us that not only do Streams/Sequences have a very measurable overhead
## Conclusion ## Conclusion
Overall, I think that Kotlin’s Sequences are a good addition to the language. Overall, I think that Kotlin’s Sequences are a good addition to the language, despite their flaws.
They fall behind Streams when working with primitives They are significantly slower than Streams when working with primitives
because the Java standard library has subtypes for many generic constructs to more efficiently handle primitive types, because the Java standard library has subtypes for many generic constructs to more efficiently handle primitive types,
but in most real-world JVM applications (that being enterprise-level bloatware), but in most real-world JVM applications (that being enterprise-level bloatware),
primitives are the exception rather than the rule. primitives are the exception rather than the rule.
Still, Kotlin has some types that optimize for this, such as `LongIterator`, Still, Kotlin already has some types that optimize for this,
such as `LongIterator`,
but without a `LongSequence` to go with it, but without a `LongSequence` to go with it,
the boxing will still happen eventually, the boxing will still happen eventually,
and all the performance gains are void. and all the performance gains are void.
@ -344,27 +356,31 @@ Apart from the performance, Sequences are also a lot easier to understand and ev
Implementing your own Sequence requires barely more than an implementation of the underlying iterator, Implementing your own Sequence requires barely more than an implementation of the underlying iterator,
as can be seen in [CoalescingSequence](https://git.kageru.moe/kageru/Sekwences/src/branch/master/src/main/kotlin/moe/kageru/sekwences/CoalescingSequence.kt) as can be seen in [CoalescingSequence](https://git.kageru.moe/kageru/Sekwences/src/branch/master/src/main/kotlin/moe/kageru/sekwences/CoalescingSequence.kt)
which I implemented last year to get a feeling for how all of this works. which I implemented last year to get a feeling for how all of this works.
Streams on the other hand are a lot more complex. They actually extend `Consumer<T>`, Streams on the other hand are a lot more complex. They extend `Consumer<T>`,
so a `Stream<T>` is just a `void consume(T input)` that can be called repeatedly. so a `Stream<T>` is actually just a `void consume(T input)` that can be called repeatedly.
That makes it a lot harder to grasp where data is coming from and how it is requested, at least to me. That makes it a lot harder to grasp where data is coming from and how it is requested, at least to me.
Simplicity is often underrated in software, but I consider it a huge plus for Sequences. Simplicity is often underrated in software, but I consider it a huge plus for Sequences.
I will continue to use them liberally, I will continue to use them liberally,
unless I find myself in a situation where I need to process a huge number of primitives. unless I find myself in a situation where I need to process a huge number of primitives.
And even then, I now know that Java’s Streams are a good choice. And even then, I now know that Java’s Streams are a good alternative,
as long as my code isn’t plain stupid and in dire need of the JIT optimizer.
25% might sound like a lot, 25% might sound like a lot,
but it’s more than worth it if it means leaving code that is much easier to understand and modify for the next person. but it’s more than worth it if it means leaving code that is much easier to understand and modify for the next person.
Unless you’re actually in a very performance-critical part of your application, Unless you’re actually in a very performance-critical part of your application,
but if you ever find yourself in that situation, but if you ever find yourself in that situation,
you should switch to a different language. you should switch to a different language.
Writing simple and correct code should always be more important than writing fast code.
\
\
On that note: I was originally going to include Rust’s iterators here for comparison, \
\
On the note of switching languages:
I was originally going to include Rust’s iterators here for comparison,
but rustc optimized away all of my benchmarks with [constant time solutions](https://godbolt.org/z/iJaWVP). but rustc optimized away all of my benchmarks with [constant time solutions](https://godbolt.org/z/iJaWVP).
It’s a fascinating topic, That was a fascinating discovery for me,
and I might write a separate blog post and I might write a separate blog post
where I dissect some of the assembly that rustc/LLVM produced, where I dissect some of the assembly that rustc/LLVM produced,
but I need to properly understand it myself first. but I feel like I’ll need to learn a few more things about compilers first.