Search code examples
javahashmap

HashMap in Java, 100 Million entries


I want to store 100 Million terms and their frequencies (in a text database ) into a HashMap <String, Double>. It is giving me "Out of Memory" Error. I tried to increase the heap-space to -Xmx15000M. However it runs half an hour then again throw the same exception. The file size from which I'm trying to read the words and frequencies is 1.7GB.


Solution

  • For word processing like that the answer is usually a tree rather than hashmap, if you can live with the longer lookup times. That structure is quite memory efficient for natural languages, where many words have common start strings.

    Depending on the input, a Patricia tree might be even better.

    (Also, if this is indeed words from a natural language, are you sure you really need 100,000,000 entries? The majority of commonly used words is surprisingly low, commercial solutions (word prediction, spelling correction) rarely use more than 100,000 words regardless of language.)