Is there a way to identify similar noun phrases. Some suggest use pattern-based approaches, for example X as Y
expressions:
Usain Bolt as Sprint King
Liverpool as Reds
There are many techniques to find alternative names for a given entity, using patterns such as:
X also known as Y
X also titled as Y
and scanning large collections of documents (e.g., Wikipedia or news papers articles) is one way to do it.
There are also other alternatives, one I remember is using Wikipedia inter-links structure, for instance, by exploring the redirect links between articles. You can download a file with a list of redirects from here: https://wiki.dbpedia.org/Downloads2015-04 and exploring the file you can find alternative names/synonyms for entities, e.g.:
Kennedy_Centre -> John_F._Kennedy_Center_for_the_Performing_Arts>
Lord_Alton_of_Liverpool -> David_Alton,_Baron_Alton_of_Liverpool
Indiana_jones_2 -> Indiana_Jones_and_the_Temple_of_Doom
Another thing you can do is combine these two techniques, for instance, look for text segments where both Indiana Jones
and Indiana_Jones_and_the_Temple_of_Doom
occur and are not further apart more than, let's say, 4 or 5 tokens. You might find patterns like also titled as
, then you can use these patterns to find more synonyms/alternative names.