I am using Spacy as well as a bit of custom code to do some natural language processing for work. We want to do something where we can find a where a paper was written by using the locations sited in the paper and was curious if there is a package that could find locations such as countries, cities, states etc? Thanks for your time.
Spacy has named entity recognition (NER). One type of entity that the pre-trained models have is LOC
for location. There's also GPE
(geo-political entity) in some of the models. The en_core_web_sm
I use below has both LOC
and GPE
. (Full listing at https://spacy.io/api/annotation#named-entities). See also: https://spacy.io/usage/linguistic-features#named-entities
It's not going to be perfect out of the box, but it might be useful.
Minimal example:
import spacy # install cmd: pip3 install spacy --user
import en_core_web_sm # install cmd: python3 -m spacy download en_core_web_sm --user
text='San Fransisco is in California and my friend Frank lives there, close to the bay. He purchased his first house last January.'
NLP = en_core_web_sm.load()
output = NLP(text)
for item in output.ents:
print(item.label_, item)
has this output:
GPE San Fransisco
GPE California
PERSON Frank
DATE last January