I am trying to extract Named Entities using first answer to this question and code is as following
for i in df['Article'].to_list():
doc = nlp(i)
for entity in doc.ents:
print((entity.text))
But it is not printing entities. I have tried print(i)
and print(doc)
both variables have values and df['Article']
contains news text. Can someone help with why second loop is not extracting entities? Thank you
EDIT:
This is dataset file, please run following code to form preprocessing that I have done.
df.iloc[:,0].dropna(inplace=True)
df = df[df.iloc[:,0].notna()]
to remove special characters from df['Articles']
df['Article'] = df['Article'].map(lambda x: re.sub(r'\W+', '', x))
With df['Article'].map(lambda x: re.sub(r'\W+', '', x))
, you remove all whitespace chars from your articles.
You need to use
df['Article'] = df['Article'].str.replace(r'(?:_|[^\w\s])+', '')
With that regex, you will only remove special chars other than whitespaces.