Search code examples
nlptext-classificationsynonymnamed-entity-recognitionpattern-synonyms

How to find similar noun phrases in NLP?


Is there a way to identify similar noun phrases. Some suggest use pattern-based approaches, for example X as Y expressions:

Usain Bolt as Sprint King

Liverpool as Reds


Solution

  • There are many techniques to find alternative names for a given entity, using patterns such as:

    • X also known as Y
    • X also titled as Y

    and scanning large collections of documents (e.g., Wikipedia or news papers articles) is one way to do it.

    There are also other alternatives, one I remember is using Wikipedia inter-links structure, for instance, by exploring the redirect links between articles. You can download a file with a list of redirects from here: https://wiki.dbpedia.org/Downloads2015-04 and exploring the file you can find alternative names/synonyms for entities, e.g.:

    • Kennedy_Centre -> John_F._Kennedy_Center_for_the_Performing_Arts>
    • Lord_Alton_of_Liverpool -> David_Alton,_Baron_Alton_of_Liverpool
    • Indiana_jones_2 -> Indiana_Jones_and_the_Temple_of_Doom

    Another thing you can do is combine these two techniques, for instance, look for text segments where both Indiana Jones and Indiana_Jones_and_the_Temple_of_Doom occur and are not further apart more than, let's say, 4 or 5 tokens. You might find patterns like also titled as, then you can use these patterns to find more synonyms/alternative names.