Stream.concat vs. new ArrayList Performance
During a code review I suggested a code improvement related to JDK8+ streams. The original code looked very similar like the following:
1List<Element> result = content.getFancyStuffs().stream()
2 .flatMap(item -> {
3 List<Element> objects = new ArrayList<>();
4 objects.add(item.getElement());
5 objects.addAll(item.getElements());
6 return objects.stream();
7 })
8 .collect(toList());
Some more details here. The getFancyStuffs()
returns a list of FancyStuff
elements.
The FancyStuff
class contains two getters where getElement()
returns a single Element
whereas the getElements()
returns (Guess what?) a list of Element
's.
The interesting part was the lambda which creates a new ArrayList
and adds a single element
objects.add(item.getElement())
and the second part which adds several elements
via objects.addAll(item.getElements)
.
My suggestion based on better readability was to use the following code instead:
1List<Element> result = content.getFancyStuffs()
2 .stream()
3 .flatMap(fs -> Stream.concat(Stream.of(fs.getElement()), fs.getElements().stream()))
4 .collect(Collectors.toList());
So far so good. But after a time I began to think about the two solutions. I asked myself: Which is the faster one? Which one uses more memory? (The usual questions a developer is asking? Don't you?).
So what would you guess which is the faster solution? The first one or the second one?
My guess was that the first solution will win, but based on some assumptions. This
means the number of elements which are coming through content.getFancyStuffs().stream()..
are more or less small (less than 20) and furthermore the number of elements wich are
returned by item.getElements()
are fairly small as well (less than 20).
The only thing which can help here to get a reliable answer is to measure it. No assumptions, no educated guesses etc. So I have to make a real performance measurement. So I created a separate project which really measures the performance.
So the first code part for performance measurement looks like this:
Solution 1
1@Benchmark
2public List<Element> with_new_arraylist(Container content) {
3 return content.getFancyStuffs().stream().flatMap(item -> {
4 ArrayList<Element> objects = new ArrayList<>();
5 objects.add(item.getElement());
6 objects.addAll(item.getElements());
7 return objects.stream();
8 }).collect(Collectors.toList());
9}
and the second part:
Solution 2
1@Benchmark
2public List<Element> with_stream_concat(Container content) {
3 return content.getFancyStuffs()
4 .stream()
5 .flatMap(fs -> Stream.concat(Stream.of(fs.getElement()), fs.getElements().stream()))
6 .collect(Collectors.toList());
7}
while writing the above code I thought about some parts of it and I got two other possible variations.
Solution 3
The following example where I put elements directly into the constructor of the
ArrayList
. This means it could only happen that in rarer cases the size of the
array list must be resized which depends on the number of elements in item.getElements()
.
1@Benchmark
2public List<Element> with_new_arraylist_constructor(Container content) {
3 return content.getFancyStuffs().stream().flatMap(item -> {
4 ArrayList<Element> objects = new ArrayList<>(item.getElements());
5 objects.add(item.getElement());
6 return objects.stream();
7 }).collect(Collectors.toList());
8}
Solution 4
Finally this one where I already calculate the size of the final list by giving the number of elements via the constructor. This will prevent the resizing of the array list at all cause the size will fit always.
1@Benchmark
2public List<Element> with_new_arraylist_constructor_size(Container content) {
3 return content.getFancyStuffs().stream().flatMap(item -> {
4 ArrayList<Element> objects = new ArrayList<>(item.getElements().size() + 1);
5 objects.add(item.getElement());
6 objects.addAll(item.getElements());
7 return objects.stream();
8 }).collect(Collectors.toList());
9}
Measurement
The measurement has been done on an Intel Xeon Machine with 3.50GHz with CentOS Linux release 7.6.1810 (Core). The full source code of the project can be found https://github.com/khmarbaise/performance-concat.
Basic Measurement
So I began very simple only with the first two solutions (1+2):
The following is only an excerpt of the above The detailed measurement result
(I have remove the prefix BenchmarkStreamConcat
from all results here in the post).
1Benchmark Mode Cnt Score Error Units
2with_new_arraylist avgt 15 22.350 ± 1.415 us/op
3with_stream_concat avgt 15 15.716 ± 2.561 us/op
So if you take a look at the results above you can already see that for a small amount of
elements (49 getElements, 50 FanceStuff instances) the one with stream_concat
is faster.
Hm..is this really true? Not a measuring error or coincidence?
Parameterized Measurement
To reduce the likely of stumbling over a coincidence or a measuring error I changed the code which now takes a parameter and enhanced the number of elements. I stick with the two solutions (1+2).
The getElements()
results always in 49 elements where as the number of FancyStuff
elements varies (see count
). The following result shows that the version
with stream_concat
is always faster.
1Benchmark (count) Mode Cnt Score Error Units
2with_new_arraylist 50 avgt 15 21.759 ± 0.797 us/op
3with_new_arraylist 100 avgt 15 43.309 ± 1.449 us/op
4with_new_arraylist 1000 avgt 15 498.693 ± 103.550 us/op
5with_new_arraylist 2000 avgt 15 988.483 ± 80.574 us/op
6with_new_arraylist 5000 avgt 15 3379.189 ± 376.885 us/op
7with_stream_concat 50 avgt 15 17.695 ± 3.601 us/op
8with_stream_concat 100 avgt 15 38.559 ± 13.014 us/op
9with_stream_concat 1000 avgt 15 458.131 ± 95.578 us/op
10with_stream_concat 2000 avgt 15 815.142 ± 183.491 us/op
11with_stream_concat 5000 avgt 15 2682.883 ± 287.596 us/op
Interestingly this is not only the case for larger number of elements. It is also for a small number of elements the case.
Running all solutions
So finally I ran all solutions (1+2+3+4) with different numbers (count, elementCount) with different combinations. I honestly have to admit that I underestimated the time it took to finish that test (it took 13 hours+).
I just picked up some examples of the measured times here:
1Benchmark (count) (elementCount) Mode Cnt Score Error Units
2with_new_arraylist 10 1000 avgt 15 77,358 ± 2,957 us/op
3with_new_arraylist_constructor 10 1000 avgt 15 76,431 ± 1,544 us/op
4with_new_arraylist_constructor_size 10 1000 avgt 15 73,156 ± 0,118 us/op
5with_stream_concat 10 1000 avgt 15 68,461 ± 0,048 us/op
6....
7with_new_arraylist 1000 1000 avgt 15 12864,161 ± 164,193 us/op
8with_new_arraylist_constructor 1000 1000 avgt 15 12703,437 ± 346,234 us/op
9with_new_arraylist_constructor_size 1000 1000 avgt 15 12401,783 ± 387,433 us/op
10with_stream_concat 1000 1000 avgt 15 11613,553 ± 632,684 us/op
Another run
So I ran also a solution with all possible options im JMH which took very long (1 day + 15 hours+).
So I will pick up some examples of the measured times here:
1Benchmark (count) (elementCount) Mode Cnt Score Error Units
2with_new_arraylist 1000 500 ss 3 22169330,000 ± 25379203,013 ns/op
3with_new_arraylist_constructor 1000 500 ss 3 6425606,000 ± 4699006,445 ns/op
4with_new_arraylist_constructor_size 1000 500 ss 3 22911512,667 ± 13112575,975 ns/op
5with_stream_concat 1000 500 ss 3 5328312,333 ± 4162857,199 ns/op
6...
7with_new_arraylist 1000 1000 ss 3 13283635,000 ± 7645310,577 ns/op
8with_new_arraylist_constructor 1000 1000 ss 3 35622844,333 ± 49138969,434 ns/op
9with_new_arraylist_constructor_size 1000 1000 ss 3 14122526,333 ± 4304061,268 ns/op
10with_stream_concat 1000 1000 ss 3 13405022,333 ± 17950218,966 ns/op
So finally the question comes: What do those numbers mean?
quote from the JMH output:
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell.