I have a dataset containing list of sentences that have both proper nouns and common nouns in them. Example -
The casing of the dataset can also be mixed.
I want to extract all the proper nouns AND the corresponding sentences where they appear in two separate columns -
Is there any way to do this in Python? I am quite new to concepts of NLP and Python overall. Thanks!
you can try with any languauge model like , spacy or nltk as mention by @ivanp
I have just used spacy model ,
import spacy
import string
nlp = spacy.load("en_core_web_sm") # load pretrained model
def proper_noun_extraction(x):
prop_noun = []
doc = nlp(string.capwords(x))
for tok in doc:
if tok.pos_ == 'PROPN':
prop_noun.append(str(tok))
if len(prop_noun) !=0:
return (' '.join(prop_noun), x)
else:
return ('no proper noun found', None)
tuple_noun_sent = df['sentence'].apply(lambda x:proper_noun_extraction(x))
resultant_df = pd.DataFrame(tuple_noun_sent.tolist(), columns = ['proper_noun', 'sentence'])