Search code examples
javahashmapwritetofile

HashMap stored on disk is very slow to read back from disk


I have a HashMap that stores external uids and then it stores a different id ( internal for our app ) that has been set for the given uid.

e.g:

  • 123.345.432=00001
  • 123.354.433=00002

The map is checked by uid to make sure the same internal id will be used. If something is resent to the application.

DICOMUID2StudyIdentiferMap defined as follows:

private static Map DICOMUID2StudyIdentiferMap = Collections.synchronizedMap(new HashMap());

The load however will overwrite it, if we successfully load, otherwise it will use the default empty HashMap.

Its read back from disk by doing:

FileInputStream f = new FileInputStream( studyUIDFile );  
ObjectInputStream s = new ObjectInputStream( f );

Map loadedMap = ( Map )s.readObject();
DICOMUID2StudyIdentiferMap = Collections.synchronizedMap( loadedMap );

The HashMap is written to disk using:

FileOutputStream f = new FileOutputStream( studyUIDFile );
ObjectOutputStream s = new ObjectOutputStream( f );

s.writeObject(DICOMUID2StudyIdentiferMap);

The issue I have is, locally running in Eclipse performance is fine, but when the application is running in normal use on a machine the HashMap is taking several minutes to load from disk. Once loaded it also takes a long time to check for a previous value by say seeing if DICOMUID2StudyIdentiferMap.put(..., ...) will return a value.

I load the same map object in both cases, its a ~400kb file. The HashMap that it contains has about ~3000 key-value pairs.

Why is it so slow on one machine, but not in eclipse?

The machine is a VM running XP it has only recently started becoming slow to read the HashMap, so it must be related to the size of it, however 400kb isn't very big I don't think.

Any advice welcome, TIA


Solution

  • Not sure that serialising your Map is the best option. If the Map is disk-based for persistance, why not use a lib that's designed for disk? Check out Kyoto Cabinet. It's actually written in c++ but there is a java API. I've used it several times, it's very easy to use, very fast and can scale to a huge size.

    This is an example I'm copy/pasting for Tokyo cabinet, the old version of Kyoto, but it's basically the same:

    import tokyocabinet.HDB;
    
    ....
    
    String dir = "/path/to/my/dir/";
    HDB hash = new HDB();
    
    // open the hash for read/write, create if does not exist on disk
    if (!hash.open(dir + "unigrams.tch", HDB.OWRITER | HDB.OCREAT)) {
        throw new IOException("Unable to open " + dir + "unigrams.tch: " + hash.errmsg());
    }
    
    // Add something to the hash
    hash.put("blah", "my string");
    
    // Close it
    hash.close();