Search code examples
multithreadingconcurrencymergejava-8concurrenthashmap

Java 8 ConcurrentHashMap merge vs computeIfAbsent


I'm working exercises from the book "Java SE 8 for the Really Impatient" by Cay S. Horstmann. There're 2 exercises that ask for different implementations of the same algorithm, one using merge, the other computeIfAbsent. I've implemented the program using merge but can't figure out how to use computeIfAbsent to do the same thing. It seems to me that computeIfPresent would be a better fit, because merge only works if the key is present and so does computeIfPresent.

Problem statement:

Write an application in which multiple threads read all words from a collection of files. Use a ConcurrentHashMap<String, Set<File>> to track in which files each word occurs. Use the merge method to update the map.

My code using merge:

public static Map<String, Set<File>> reverseIndexUsingMerge(final Path path)
        throws IOException {
    final ConcurrentHashMap<String, Set<File>> map = new ConcurrentHashMap<>();

    final BiConsumer<? super String, ? super Set<File>> action = (key,
        value) -> map.merge(key, value, (existingValue, newValue) -> {
        LOGGER.info("Received key: {}, existing value: {}, new value: {}.",
            key, existingValue, newValue);

        newValue.addAll(existingValue);

        return newValue;
    });

    commonPool().invokeAll(
        find(path, 1,
            (p, fileAttributes) -> fileAttributes.isRegularFile())
            .map(p -> new ReverseIndex(p, action))
            .collect(toList()));

    return unmodifiableMap(map);
}

private static class ReverseIndex implements Callable<Void> {
    private final Path p;
    private final BiConsumer<? super String, ? super Set<File>> action;

    private static final Pattern AROUND_WHITESPACE = compile("\\s");

    private ReverseIndex(final Path p,
        final BiConsumer<? super String, ? super Set<File>> action) {
        this.p = p;
        this.action = action;
    }

    @Override
    public Void call() throws Exception {
        reverseIndex().forEach(action);

        return null;
    }

    private Map<String, Set<File>> reverseIndex() {
        /* File stream needs to be closed. */
        try (Stream<String> lines = lines(p, UTF_8)) {
        return lines.flatMap(AROUND_WHITESPACE::splitAsStream)
            .collect(
                groupingBy(String::toString,
                    mapping(word -> p.toFile(), toSet())));
        } catch (IOException e) {
        LOGGER.error("Something went wrong. Get the hell outta here.",
            e);

        throw new UncheckedIOException(e);
        }
    }
}

Solution

  • Focus on what have to be done, if the value is absent. What you have to do, is to create a new Set value for the absent entry. Of course, if you use an operation which has an atomicity guaranty for the creation of the Set only, adding to the Set will happen concurrently, which requires using a concurrent Set. You can utilize a ConcurrentHashMap to create a de-facto ConcurrentHashSet (which doesn’t exist in that form) by mapping to a fixed value, which is especially simple if you let the value signalling presence be Boolean.TRUE:

    ConcurrentHashMap<String, Set<File>> map=new ConcurrentHashMap<>();
    final BiConsumer<? super String, ? super Set<File>> action =
        (key, value) -> map.computeIfAbsent(key, x->ConcurrentHashMap.newKeySet())
                           .addAll(value);