Search code examples
javalucenewordnet

Lucene: new WordnetSynonymParser(boolean dedup,boolean expand, Analyzer analyzer)


Please, the constructor of WordnetSynonymParser accept three parameters:

boolean dedup, boolean expand and an Analyzer.

But, what is dedup and expand? I don't understand.

The documentation cites:

If dedup is true then identical rules (same input, same output) will be added only once.

which means? An example? And the parameter expand?

Help me, please... thanks


Solution

  • The dedup value is passed directly to the SynonymMap.Builder, and does as it says. If two identical synonym rules exist, it only uses one of them. It's probably pretty safe to set this to true unless you have reason not to.

    to understand expand, here's how it is used:

     if (expand) {
       for (int i = 0; i < size; i++) {
         for (int j = 0; j < size; j++) {
           add(synset[i], synset[j], false);
         }
       }
     } else {
       for (int i = 0; i < size; i++) {
         add(synset[i], synset[0], false);
       }
     }
    

    So, if expand is true, it adds a synonym to the resulting set for each possible combination of synonyms in the set. If it is false, it would create synonym rules such that each synonym would be replaced only with the first synonym in the list. Say, if we had a set of synonymous words: "walk", "stroll" and "amble"

    Expanded, this would generate the synonyms:

    walk -> walk
    walk -> stroll
    walk -> amble
    stroll -> walk
    stroll -> stroll
    stroll -> amble
    amble -> walk
    amble -> stroll
    amble -> amble
    

    Without expanding, you would just have:

    walk -> walk
    stroll -> walk
    amble -> walk
    

    Generally, I would be inclined to set this to false, so that synonym matches get reduced to one main synonym, but it does depend on your needs.