Search code examples
pythonnlptext-processing

How to identify Nouns in string and capitalize them?


I have simple plain text in lower case and no punctuation. Is there any library which can help changing the upper case like where Nouns are or where required? Like names after Mr. and so.? Any solution or guiding hint can be very helpful. for example: in english language in English .. in plain text, at several places are names. and several names needed to be capitalized. like

mr. john is living in canada

to

Mr. John is living in Canada

Solution

  • Here is a workaround using nltk library to identify nouns using pos_tag feature:

    #Import nltk modules
    
    import nltk
    from nltk.tokenize import word_tokenize
    from nltk.tag import pos_tag
    
    text = "mr. john is living in canada"
    
    #Define a function to extract nouns from the string
    
    def ExtractNoun(sentence):
        sentence = nltk.word_tokenize(sentence)
        sentence = nltk.pos_tag(sentence)
        return sentence
    
    sent = ExtractNoun(text)
    
    #This will return a tuple of tokens and tags
    
    print(sent)
    [('mr.', 'NN'), ('john', 'NN'), ('is', 'VBZ'), ('living', 'VBG'), ('in', 'IN'), ('canada', 'NN')]
    
    #Create a list of nouns
    
    nn = [i[0] for i in sent if i[1] == 'NN']
    
    #Capitalize the nouns which are matching with the list
    
    text_cap = " ".join([x.capitalize() if x in nn else x for x in text.split()])
    print(text_cap)
    
    'Mr. John is living in Canada'
    

    Hope this works!!