Search code examples
javalistoptimizationsetguava

Fastest way to convert Guava Sets to List


I'm not even sure if this is possible but I'm doing a Set operation like union or intersection and I need to convert that to a List in order to shuffle the list and pass that to a different method that accepts a List instead of a Set. So I convert the result to a List and all is good. But then from the profiler, I see that the operation is taking a long time under load and it's because of the way Guava Sets .size() does. It's not a constant operation like normal java Set.

Here's the code example

    @Test
    void testSet() {
        Set<Character> first = ImmutableSet.of('a', 'b', 'c');
        Set<Character> second = ImmutableSet.of('b', 'c', 'd');

        Set<Character> union = Sets.union(first, second);

        List<Character> characters = new ArrayList<>(union);
    }

I'm trying to find the fastest way to convert a Guava Sets to a List. From digging through the code this is what Guava Sets does https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Sets.java#L694. It's not a constant operation and it hurts the performance under high load. I'm guessing that .size call is from when Java wants to copy a new Collection to a new List it has to know the size to create a List.


Solution

  • Putting "performance under high load" argument aside (I suggest doing proper JMH microbenchmarks if it's really relevant in your use case), Sets operations are memory-optimized, so if you're copying the data right away, you may want to try different approaches which don't call size at all.

    First, Sets.union returns SetView<E> which has immutableCopy(), on which then you can call .asList() view, returning an immutable list (feel free to chain all operations together):

    @Test
    public void testSetCopy() {
        Set<Character> first = ImmutableSet.of('a', 'b', 'c');
        Set<Character> second = ImmutableSet.of('b', 'c', 'd');
    
        Sets.SetView<Character> union = Sets.union(first, second);
        List<Character> characters = union.immutableCopy().asList();
    
        assertThat(characters).containsOnly('a', 'b', 'c', 'd');
    }
    

    Second, you can also consider using Set in the first place, as mentioned by Louis:

    @Test
    public void testMultiset() {
        Set<Character> first = ImmutableSet.of('a', 'b', 'c');
        Set<Character> second = ImmutableSet.of('b', 'c', 'd');
    
        // here it's ugly but maybe you can collect to set in the first place
        ImmutableMultiset<Character> set = ImmutableSet.<Character>builder()
                 .addAll(first)
                 .addAll(second)
                 .build(); // [a, b, c, d]
    
        List<Character> characters = set.asList();
    
        assertThat(characters).containsOnly('a', 'b', 'c', 'd');
    }
    

    That said, YMMV, and again I encourage you to do microbenchmarks before choosing any less readable option.