Search code examples
c#thesaurus

how to perform query expansion


I am working on a C# application where the user provides a set of words ( typically less than 10) and I need to retrieve all the synonyms of these words. This is my first time working with dictionary and these stuff. I need to know the steps to follow and if there an existing dictionary that provides synonyms that I can integrate with my application or if there is an open source application or code that I can use.


Solution

  • To answer your first question. You can find a thesaurus download here: http://wordpresscloaker.com/blog/download-free-english-thesaurus-format-txt.html

    I make no promises to the quality, accuracy, legality, licensing for use, or completeness of that file. However, this will get you on your way. You need to extract the mthesaur.txt and add it to your project folder.

    Next, you need to read in the text file by doing the following:

    var reader = new StreamReader(File.OpenRead(@"C:\mthesaur.txt"));
    var dict = new Dictionary<string, string>();
    while (!reader.EndOfStream)
    {
        // Read the file line by line.
        var line = reader.ReadLine();
    
        // If the line isn't null, we can use it.  This shouldn't happen but it is a good sanity check.
        if (line == null) continue;
        // Split the line by the delimiter (a comma) so we can get the main word, the first one on the line.
        var splitLine = line.Split(',');
        var mainWord = splitLine[0];
        // To save us from having to loop through and only get the indexes above 0 (eg, skip the main word) we will just simply remove it from the line so we have just synonyms.
        line = line.Replace(mainWord + ",", string.Empty);
        // Now we make use of the dictionary type in C# and add the mainword as the key and the synonyms as the value.
        try
        {
            dict.Add(mainWord, line);
        }
        catch (ArgumentException argEx)
        {
            Console.WriteLine("Attempted to add {0} to the dictionary but it already exists.", mainWord);
        }
    }
    

    Now that we have everything in a key/value dictionary in C#, you can use LINQ to query out the synonyms for an entered word. This can be done by either using a drop down that contains all the key values from the dictionary (not recommended as this will be an extremely large drop down and hard to navigate for the user), a ListBox (better, easier to navigate), or a plain text search box. While this doesn't completely answer your question as there is nothing here about handling a GUI for the user, this should get you well on your way.