Search code examples
javahashmapgraph-traversal

Find 'connected components' in graph


I'm building a thesaurus using a HashMap <String,ArrayList<String>> to hold words and their synonyms (this data structure is required).

For the purpose of the assignment, the synonymity relation is considered transitive. (We can imagine the thesaurus as a graph). What I'm trying to accomplish is to print this graph in a text file, with a connected component on each line. In other words, all the words that can be pooled together as synonyms should go on a single line.

public void save() {
    try {
        FileWriter fw = new FileWriter(defaultDefinitionFile);
        BufferedWriter out = new BufferedWriter(fw);
        Set<String> keys = thesaurus.keySet();
        Iterator<String> ite = keys.iterator();
        while (ite.hasNext()) {
            String key = ite.next();
            out.write(key);
            ArrayList<String> synonyms = thesaurus.get(key);
            Iterator<String> i = synonyms.iterator();
            while (i.hasNext()) {
                String syn = i.next();
                out.write(","+syn);
                keys.remove(syn);
            }
            out.write("\r\n");
        }
        out.close();
        fw.close();
    }
    catch (Exception e) {
        System.out.println("Error writing to file");
        e.printStackTrace();
    }
}

This is how I pictured it to happen:

Print a word along with each of its synonyms, then remove those synonyms from the data structure so we don't have duplicate lines.

Problem is of course that I can't delete anything while i'm iterating over the content of the hashmap.

Any alternative approaches I'm missing?

P.S. I'm keeping the 'graph' metaphor throughout only because i needed the title to be eloquent and succint. I understand that this metaphor is limited in usefulness.


Solution

  • You can store the words that were printed in a Set, and then only handle words that are not yet in the set.

    Side remark: though it's true that one can think about this as a graph problem, your code doesn't treat this as such. If we were to treat this as a graph problem, then we wouldn't make the assumption that each word has all its synonyms listed in the corresponding ArrayList, thus calling for the calculation of the symmetric and transitive closure. Only then would we extract the equivalence classes.

    (In reality the synonym relation is not transitive, I know.)