Search code examples
javafilehashmapfrequency

How to record a string appear frequency faster using HashMap in Java?


I'm studying a decision tree, and the algorithm has a part of record string frequency from file. This file have 30,000 cases and 1.68MB size.

I try to using HashMap to do this, in my main algorithm code, the replace method run about 900 milion times and took me about 30 seconds. Any way I can do it faster?

There are simplify code of my main algorithm code below, it took me about 10 second.

Map<String, Integer> classesCount = new HashMap<>();
int target = 900000000;

classesCount.put("a", 0);
classesCount.put("b", 0);

for(int i = 0; i < target; i++) {
    if (i % 2 == 0) {
        classesCount.replace("a", classesCount.get("a") + 1);
    }
    else {
        classesCount.replace("b", classesCount.get("b") + 1);
    }
}

To make it more clear my actual code, I have a class Value, and I have an array of Value class in main method, this is Values class as below.

public class Value<T extends Comparable<T>> implements Comparable<Value<T>> {
public T value;
public String result;

public Value(T value, String result) {
    this.value = value;
    this.result = result;
}

public int compareTo(Value<T> v) {
    return value.compareTo(v.value);
}

}

this is main method code as below. assume arrayOfValue already have many element and every Value's result just have "a" and "b":

Map<String, Integer> classesCountA = new HashMap<>();
Map<String, Integer> classesCountB = new HashMap<>();
Value[] arrayOfValue = new Value[];
int splitIndex = 55;

classesCountA.put("a", 0);
classesCountA.put("b", 0);
classesCountB.put("a", 0);
classesCountB.put("b", 0);

for(int i = 0; i < arrayOfValue.length; i++) {
    if(i < splitIndex) {
        classesCountA.replace(arrayOfValue[i].result, classesCount.get(arrayOfValue[i].result) + 1);
    }
    else {
        classesCountB.replace(arrayOfValue[i].result, classesCount.get(arrayOfValue[i].result) + 1);
    }
}

Solution

  • You don't need to replace the map's value at all. In contrast to keys map values are allowed to be mutable so all you need is a mutable structure to hold the frequency for each value.

    Thus you could do it like this (simplified):

    class Frequency {
      int value;
    }
    
    Map<String, Frequency> frequencyMap = new HashMap<>();
    
    //iterate over the words
    for(String word : words) {
      //get the mutable frequency for each word
      Frequency f = frequencyMap.get(word);
    
      //if the entry doesn't exist yet put it into the map
      //(you could use computeIfAbsent but it would be slower
      if( f == null ) {
         f = new Frequency();
         frequencyMap.put(word, f);
      }
    
      //just mutate the frequency - no need to change the map again
      f.value++;
    }
    

    On my machine that's about 5x faster than the replace(key, get(key) + 1) approach.