Search code examples
javaluceneacronym

Is there a way to match acronyms with their extended names in Lucene?


I'm indexing people's tweets and their location using Lucene, but people put so weird names as location...however...

Is there a way to match these (in indexing time or in query time)? 1) USA 2) United States of America 3) United States

1) Oklahoma 2) Ok

and so on...

P.S. I'd like a solution thanks to which I don't need to write a synonim dictionary on my own


Solution

  • You can address this both at indexing or at querying time.

    At indexing time you would need to enrich your data by doing a lookup in a synonym dictionary you provide and then index both, the original term and the synonym with the same postings information.

    Alternatively you can do the same look with the query string and build up a BooleanQuery "OR"-ing the original term and the synonym.