I am looking for a way to find, in a sentence, if a common noun refers to places. This is easy for proper nouns, but I didn't find any straightforward solution for common nouns.
For example, given the sentence "After a violent and brutal attack, a group of college students travel into the countryside to find refuge from the town they fled, but soon discover that the small village is also home to a coven of serial killers" I would like to mark the following nouns as referred to places: countryside, town, small village, home.
Here is the code I'm using:
import spacy
nlp = spacy.load('en_core_web_lg')
# Process whole documents
text = ("After a violent and brutal attack, a group of college students travel into the countryside to find refuge from the town they fled, but soon discover that the small village is also home to a coven of satanic serial killers")
doc = nlp(text)
# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
Which gives as output the following:
Noun phrases: ['a violent and brutal attack', 'a group', 'college students', 'the countryside', 'refuge', 'the town', 'they', 'the small village', 'a coven', 'serial killers']
Verbs: ['travel', 'find', 'flee', 'discover']
You can use WordNet for this.
from nltk.corpus import wordnet as wn
loc = wn.synsets("location")[0]
def is_location(candidate):
for ss in wn.synsets(candidate):
# only get those where the synset matches exactly
name = ss.name().split(".", 1)[0]
if name != candidate:
continue
hit = loc.lowest_common_hypernyms(ss)
if hit and hit[0] == loc:
return True
return False
# true things
for word in ("countryside", "town", "village", "home"):
print(is_location(word), word, sep="\t")
# false things
for word in ("cat", "dog", "fish", "cabbage", "knife"):
print(is_location(word), word, sep="\t")
Note that sometimes the synsets are wonky, so be sure to double-check everything.
Also, for things like "small village", you'll have to pull out the head noun, but it'll just be the last word.