Search code examples
javaarraysparsingdictionarythesaurus

What is the best way to get just the synonyms from Moby Grady Thesaurus in Java?


I'm creating a visual thesaurus which will act as a watered down version shown here: https://www.visualthesaurus.com/

I'm a new programmer and this will be one of my first projects. I'm using Moby Grady's Thesaurus text file for my thesaurus list but I'm running into issues.

Moby Thesaurus is formatted so there's a root word, followed by a comma, followed by like or relating words, than a carriage feed/line break and than another root word...

ex. Root word, like word, like word, like word

The technique I'm using for find the synonyms at the moment goes like this: 1. Enter word to find 2. Start at line one, turn line into String array and than test to see if the wordToFind is in that line, if it is, print the line and search more lines for the wordToFind.

I'm successfully printing out each line that holds my wordToFind but each of these words in the line are not good matches for synonyms. I'm asking for anybody with this kind of experience to help me come up with a way to get words more similar to my wordsToFind.

import java.io.BufferedReader;
import java.io.Console;
import java.io.File;
import java.io.FileReader;
import java.util.Arrays;
import java.util.Scanner;

public class Thesaurus {
    File godFile = new File("C:\\Users\\Joe\\Documents\\moby.txt");
    Console console = System.console();
    String inputWord;
    Scanner reader;

    void bigBang() {
        try (Scanner inputScanner = new Scanner(new BufferedReader(
                new FileReader(godFile)))) {

            Scanner reader = new Scanner(System.in);
            System.out.print("Synonyms for word: ");
            String theWord = reader.next();

            one: while (inputScanner.hasNextLine()) {
                String line = inputScanner.nextLine();
                String[] splitLine = line.split(",");
                for (String word : splitLine) {
                    if (word.equalsIgnoreCase(theWord)) {
                        System.out.println("Word Found!");
                        System.out.println("Synonyms for " + theWord + ":");
                        System.out.print((Arrays.toString(splitLine)));

                    }
                }

            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Solution

  • This is a more complex NLP problem which requires more than reading in a text file, but we will work with what you have. I would first suggest though looking into WordNet which you can work with online or as a download where you can get the word sense for each word.

    So it appears from the code above you are treating the root word and the "like words" in a similar fashion. Therefore, if the word you search for is first listed as a synonym of another word, you stop at that line. I would suggest you separate the concept of root words from synonyms.

    What you can do at runtime is read the entire file into a HashMap<String, List<String>>. The key is the root word and the list is the list of synonyms for a root word. This is the way a manual thesaurus works anyhow, you look for the root words and it gives you the synonyms. It would not be practical to scan all the entries to see if it contains the term you are looking for.

    Once this one-time map creation has been done, you can then do a simple lookup to the HashMap for the term the user is interested in.

    I see that the website you reference does a graph based representation which can certainly be a good idea. It is quite popular with many ontology based problems. This graph representation allows you to follow links as you might be more inclined to do so that you can find synonyms of synonyms and so on and so forth.