Search code examples
pythongeolocationnlpnltknamed-entity-recognition

How can I extract GPE(location) using NLTK ne_chunk?


I am trying to implement a code to check for the weather condition of a particular area using OpenWeatherMap API and NLTK to find entity name recognition. But I am not able to find the method of passing the entity present in GPE(that gives the location), in this case, Chicago, to my API request. Kindly help me with the syntax.The code to given below.

Thank you for your assistance

import nltk
from nltk import load_parser
import requests
import nltk
from nltk import word_tokenize
from nltk.corpus import stopwords

sentence = "What is the weather in Chicago today? "
tokens = word_tokenize(sentence)

stop_words = set(stopwords.words('english'))

clean_tokens = [w for w in tokens if not w in stop_words]

tagged = nltk.pos_tag(clean_tokens)

print(nltk.ne_chunk(tagged))

Solution

  • The GPE is a Tree object's label from the pre-trained ne_chunk model.

    >>> from nltk import word_tokenize, pos_tag, ne_chunk
    >>> sent = "What is the weather in Chicago today?"
    >>> ne_chunk(pos_tag(word_tokenize(sent)))
    Tree('S', [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('weather', 'NN'), ('in', 'IN'), Tree('GPE', [('Chicago', 'NNP')]), ('today', 'NN'), ('?', '.')])
    

    To traverse the tree, see How to Traverse an NLTK Tree object?

    Perhaps, you're looking for something that's a slight modification to NLTK Named Entity recognition to a Python list

    from nltk import word_tokenize, pos_tag, ne_chunk
    from nltk import Tree
    
    def get_continuous_chunks(text, label):
        chunked = ne_chunk(pos_tag(word_tokenize(text)))
        prev = None
        continuous_chunk = []
        current_chunk = []
    
        for subtree in chunked:
            if type(subtree) == Tree and subtree.label() == label:
                current_chunk.append(" ".join([token for token, pos in subtree.leaves()]))
            if current_chunk:
                named_entity = " ".join(current_chunk)
                if named_entity not in continuous_chunk:
                    continuous_chunk.append(named_entity)
                    current_chunk = []
            else:
                continue
    
        return continuous_chunk
    

    [out]:

    >>> sent = "What is the weather in New York today?"
    >>> get_continuous_chunks(sent, 'GPE')
    ['New York']
    
    >>> sent = "What is the weather in New York and Chicago today?"
    >>> get_continuous_chunks(sent, 'GPE')
    ['New York', 'Chicago']
    
    >>> sent = "What is the weather in New York"
    >>> get_continuous_chunks(sent, 'GPE')
    ['New York']
    
    >>> sent = "What is the weather in New York and Chicago"
    >>> get_continuous_chunks(sent, 'GPE')
    ['New York', 'Chicago']