Search code examples
javaconcurrencyfunctional-programmingjava-streamside-effects

Best practice for iterating over Collection with impure functions in Java


Suppose this is the use case: I want to update a Hashmap cache inside my class. I have a set of keys and some conditions I wanna apply to the keys and the values which are retrieved with the key.


import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Set;

public class App {

    Set<String> keysToUpdate;
    HashMap<String, List<String>> cache;

    void buildCache(){
        keysToUpdate.stream()
            .filter(k -> true)  // some filter
            .forEach(
                key ->{
                    // now values list comes from outside the stream pipeline
                    List<String> values = cache.computeIfAbsent(key, k -> new ArrayList<>());
                    getValuesforKey(key)
                        .stream()
                        .filter(k -> true) //another filter
                        // side effects are introduced
                        .forEach(value -> {
                                //some other operation, for example logging the values added like
                                // log.info("{} value added", value);
                                values.add(value);
                            }
                        );
                }

            );
    }
    private List<String> getValuesforKey(String key) {
        //some method to get the values for the key
        return new ArrayList<>();
    }
}

We are told that shared mutability like this is bad because the execution is not deterministic, but in this specific case I am adding values to a hashmap and I don't care about the order of execution if I know keysToUpdate doesn't contain repeated values.

Are there other aspects I haven't considered? Is this code safe if streams are parallelised?

If not would using the collection's iterator fix the problem? (code below). Or would it be best to use imperative programming instead? In what cases is shared mutability OK within the stream

public class App {

    Set<String> keysToUpdate;
    HashMap<String, List<String>> cache;

    void buildCache(){
        keysToUpdate.stream()
            .filter(k -> true)// some filter
            .collect(Collectors.toList()) // Collect before iterating
            .forEach(
                key ->{
                    // now values list comes from outside the stream pipeline
                    List<String> values = cache.computeIfAbsent(key, k -> new ArrayList<>());
                    getValuesforKey(key)
                        .stream()
                        .filter(k -> true) 、、another filter
                        .collect(Collectors.toList()) // Collect before iterating
                        // side effects are introduced
                        .forEach(value -> {
                                //some other operation, for example logging the values added like
                                // log.info("{} value added", value);
                                values.add(value);
                            }
                        );
                }

            );
    }
    private List<String> getValuesforKey(String key) {
        //some method to get the values for the key
        return new ArrayList<>();
    }
}

Solution

  • When dealing with multithreading, the question to be answered is whether race conditions may happen (that is, if two or more different threads may access the same resource at the same time with at least one of them trying to modify it).

    In your example, the computeIfAbsent method will modify your map if the requested key is not already there. So, two threads can potentially modify the same resource (the cache object). To avoid this, you can obtain (at the beginning of the buildCache method) a thread-safe version of your map by using Collections.synchronizedMap() and then operate on the returned map.

    For what regards the values list, safety depends on whether two threads may operate on the same key, and thus modify the same list. In your examples, keys are a unique as being obtained from a set, so the code is safe.

    Side note: the expected increase of performance depends on the amount of processing that the getValuesForKey method has to perform. If negligible, most threads will just be waiting for a lock on the map, thus making the performance increase also minimal.