Search code examples
algorithmlanguage-agnosticstringartificial-intelligence

Building a reverse language dictionary


I was wondering what does it take to build a reverse language dictionary.

The user enters something along the lines of: "red edible fruit" and the application would return: "tomatoes, strawberries, ..."

I assume these results should be based on some form of keywords such as synonyms, or some form of string search.

This is an online implementation of this concept.

What's going on there and what is involved?

EDIT 1: The question is more about the "how" rather than the "which tool"; However, feel free to provide the tools you think to do the job.


Solution

  • Any approach would basically involve having a normalized database. Here is a basic example of what your database structure might look like:

    // terms
    +-------------------+
    | id | name         |
    | 1  | tomatoes     |
    | 2  | strawberries |
    | 3  | peaches      |
    | 4  | plums        |
    +-------------------+
    
    // descriptions
    +-------------------+
    | id | name         |
    | 1  | red          |
    | 2  | edible       |
    | 3  | fruit        |
    | 4  | purple       |
    | 5  | orange       |
    +-------------------+
    
    // connections
    +-------------------------+
    | terms_id | descript_id  |
    | 1        | 1            |
    | 1        | 2            |
    | 1        | 3            |
    | 2        | 1            |
    | 2        | 2            |
    | 2        | 3            |
    | 3        | 1            |
    | 3        | 2            |
    | 3        | 5            |
    | 4        | 1            |
    | 4        | 2            |
    | 4        | 4            |
    +-------------------------+
    

    This would be a fairly basic setup, however it should give you an idea how many-to-many relationships using a look-up table work within databases.

    Your application would have to break apart strings and be able to handle normalizing the input for example getting rid of suffixes with user input. Then the script would query the connections table and return the results.