Search code examples
pythonnlpspacy

Is there a NLP package or function that knows or can find locations from a document?


I am using Spacy as well as a bit of custom code to do some natural language processing for work. We want to do something where we can find a where a paper was written by using the locations sited in the paper and was curious if there is a package that could find locations such as countries, cities, states etc? Thanks for your time.


Solution

  • Spacy has named entity recognition (NER). One type of entity that the pre-trained models have is LOC for location. There's also GPE (geo-political entity) in some of the models. The en_core_web_sm I use below has both LOC and GPE. (Full listing at https://spacy.io/api/annotation#named-entities). See also: https://spacy.io/usage/linguistic-features#named-entities

    It's not going to be perfect out of the box, but it might be useful.

    Minimal example:

    import spacy          # install cmd: pip3 install spacy --user
    import en_core_web_sm # install cmd: python3 -m spacy download en_core_web_sm --user
    
    text='San Fransisco is in California and my friend Frank lives there, close to the bay. He purchased his first house last January.'
    NLP = en_core_web_sm.load()
    output = NLP(text)
    for item in output.ents:
        print(item.label_, item)
    

    has this output:

    GPE San Fransisco
    GPE California
    PERSON Frank
    DATE last January