I have a HashSet. I need to split a String and add the elements of the string to the HashSet. I wanted to know which one is efficient way of doing this.
Collections.addAll(myHashSet, myString.split(":"));
myHashSet.addAll(List.of(myString.split(":")));
In 2nd option, I see that we are creating extra object of List to create list out of array. But, we are not creating any extra object in 1st approach. So, I think that 1st approach is more efficient. But, I wanted to check.
Collections.addAll(myHashSet, myString.split(":"))
is best.
Other options are very close though, even for very large data sets.
For small elements it really does not matter at all, do not underestimate how strong your computer is.
For bigger stuff, you have to take care to not make any accidental copies or similar:
List.of
creates an extra copy of the whole data just to make it list, badArrays.asList
creates a fake list wrapper around the array, no copies, goodCollections.addAll(...)
adds directly from the array, goodNow, there is also splitAsStream
, which avoids the array to begin with by directly adding during the splitting. Theoretically, thats the best. But in practice, the streams come with a lot of overhead.
It is your best option if the data is larger than your RAM, i.e. if you can not afford creating the array. That said, chances are you could not afford holding the String
to begin with then.
Instead of making educated guesses, you should actually run JMH benchmarks and get reliable results. So here they are.
N
is the size of the arrays, so how many strings there are.
The lower the score, the better.
Benchmark (N) (SEED) Mode Cnt Score Error Units
addSplitAsStream 20 123 avgt 5 0,001 ± 0,001 ms/op
collectionsAddAll 20 123 avgt 5 ≈ 10⁻³ ms/op
setAddAllArrays 20 123 avgt 5 ≈ 10⁻³ ms/op
setAddAllList 20 123 avgt 5 ≈ 10⁻³ ms/op
Benchmark (N) (SEED) Mode Cnt Score Error Units
addSplitAsStream 200 123 avgt 5 0,007 ± 0,001 ms/op
collectionsAddAll 200 123 avgt 5 0,004 ± 0,001 ms/op
setAddAllArrays 200 123 avgt 5 0,004 ± 0,001 ms/op
setAddAllList 200 123 avgt 5 0,004 ± 0,001 ms/op
As per comments, this benchmark includes two new cases forEachAdd
and addMatcherLoop
.
Benchmark (N) (SEED) Mode Cnt Score Error Units
addMatcherLoop 10000 123 avgt 5 0,730 ± 0,017 ms/op
addSplitAsStream 10000 123 avgt 5 0,947 ± 0,015 ms/op
collectionsAddAll 10000 123 avgt 5 0,327 ± 0,005 ms/op
forEachAdd 10000 123 avgt 5 0,334 ± 0,018 ms/op
setAddAllArrays 10000 123 avgt 5 0,329 ± 0,002 ms/op
setAddAllList 10000 123 avgt 5 0,346 ± 0,011 ms/op
As per comments, this benchmark includes two new cases forEachAdd
and addMatcherLoop
.
Benchmark (N) (SEED) Mode Cnt Score Error Units
addMatcherLoop 1000000 123 avgt 5 184,229 ± 14,699 ms/op
addSplitAsStream 1000000 123 avgt 5 222,011 ± 55,131 ms/op
collectionsAddAll 1000000 123 avgt 5 164,837 ± 189,569 ms/op
forEachAdd 1000000 123 avgt 5 198,915 ± 9,983 ms/op
setAddAllArrays 1000000 123 avgt 5 114,747 ± 59,868 ms/op
setAddAllList 1000000 123 avgt 5 115,520 ± 4,620 ms/op
Full JMH test. Adjust N
for the array sizes.
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.*;
import java.util.concurrent.TimeUnit;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(value = 1, jvmArgs = {"-Xms2G", "-Xmx2G"})
@Warmup(iterations = 2)
@Measurement(iterations = 5)
public class BenchmarkHashSetAddAll {
@Param("1000000")
private int N;
@Param("123")
private long SEED;
private String DATA;
private HashSet<String> TARGET;
public static void main(String[] args) throws RunnerException {
Options options =
new OptionsBuilder().include(BenchmarkHashSetAddAll.class.getSimpleName()).build();
new Runner(options).run();
}
@Setup
public void setup(Blackhole blackhole) {
Random rnd = new Random(SEED);
DATA = Stream.generate(rnd::nextInt)
.limit(N)
.map(String::valueOf)
.collect(Collectors.joining(":"));
TARGET = new HashSet<>();
}
@Benchmark
public void collectionsAddAll(Blackhole blackhole) {
Collections.addAll(TARGET, DATA.split(":"));
blackhole.consume(TARGET);
}
@Benchmark
public void setAddAllList(Blackhole blackhole) {
TARGET.addAll(List.of(DATA.split(":")));
blackhole.consume(TARGET);
}
@Benchmark
public void setAddAllArrays(Blackhole blackhole) {
TARGET.addAll(Arrays.asList(DATA.split(":")));
blackhole.consume(TARGET);
}
@Benchmark
public void addSplitAsStream(Blackhole blackhole) {
Pattern.compile(":").splitAsStream(DATA).forEach(TARGET::add);
blackhole.consume(TARGET);
}
@Benchmark
public void forEachAdd(Blackhole blackhole) {
for (String element : DATA.split(":")) {
TARGET.add(element);
}
blackhole.consume(TARGET);
}
@Benchmark
public void addMatcherLoop(Blackhole blackhole) {
Matcher matcher = Pattern.compile(":").matcher(DATA);
int current = 0;
while (matcher.find()) {
String element = DATA.subSequence(current, matcher.start()).toString();
current = matcher.end();
TARGET.add(element);
}
blackhole.consume(TARGET);
}
}