I am having some difficulties to select not empty fields using regex (findall) within my dataframe, looking for words contained into a text source:
text = "Be careful otherwise police will capture you quickly."
I will need to look for words that ends with ful
in my text string, then looking for words that ends with full in my dataset.
Author DF_Text
31 Better the devil you know than the one you don't
53 Beware the door with too many keys.
563 Be careful what you tolerate. You are teaching people how to treat you.
41 Fear the Greeks bearing gifts.
539 NaN
51 The honey is sweet but the bee has a sting.
21 Be careful what you ask for; you may get it.
(from csv/txt file).
I need to extract words ending with ful
in text
, then look at both DF_Text (thus Author) which contains words ending with ful
and appending results in a list.
n=0
for i in df['DF_Text']:
print(re.findall(r"\w+ful", i))
n=n+1
print(n)
My question is: how can I remove empty rows([]
) from the analysis (NaN
) and report the author names (e.g. 563
, 21
) related to?
I will be happy to provide further information, in case it would be not clear.
Use str.findall
instead of looping with re.findall
:
df["found"] = df["DF_Text"].str.findall(r"(\w+ful)")
df.loc[df["found"].str.len().eq(0),"found"] = df["Author"]
print (df)
Author DF_Text found
0 31 Better the devil you know than the one you don't 31
1 53 Beware the door with too many keys. 53
2 563 Be careful what you tolerate. You are teaching... [careful]
3 41 Fear the Greeks bearing gifts. 41
4 539 NaN NaN
5 51 The honey is sweet but the bee has a sting. 51
6 21 Be careful what you ask for; you may get it. [careful]