Search code examples
pythonregexpandastext-mining

Remove empty rows within a dataframe and check similarity


I am having some difficulties to select not empty fields using regex (findall) within my dataframe, looking for words contained into a text source:

text = "Be careful otherwise police will capture you quickly."

I will need to look for words that ends with ful in my text string, then looking for words that ends with full in my dataset.

Author      DF_Text

31       Better the devil you know than the one you don't      
53       Beware the door with too many keys.      
563      Be careful what you tolerate. You are teaching people how to treat you. 
41       Fear the Greeks bearing gifts.      
539      NaN
51       The honey is sweet but the bee has a sting.      
21       Be careful what you ask for; you may get it.

(from csv/txt file). I need to extract words ending with ful in text, then look at both DF_Text (thus Author) which contains words ending with ful and appending results in a list.

n=0
for i in df['DF_Text']:
        print(re.findall(r"\w+ful", i))
        n=n+1
        print(n)

My question is: how can I remove empty rows([]) from the analysis (NaN) and report the author names (e.g. 563, 21) related to? I will be happy to provide further information, in case it would be not clear.


Solution

  • Use str.findall instead of looping with re.findall:

    df["found"] = df["DF_Text"].str.findall(r"(\w+ful)")
    
    df.loc[df["found"].str.len().eq(0),"found"] = df["Author"]
    
    print (df)
    
       Author                                            DF_Text      found
    0      31   Better the devil you know than the one you don't         31
    1      53                Beware the door with too many keys.         53
    2     563  Be careful what you tolerate. You are teaching...  [careful]
    3      41                     Fear the Greeks bearing gifts.         41
    4     539                                                NaN        NaN
    5      51        The honey is sweet but the bee has a sting.         51
    6      21       Be careful what you ask for; you may get it.  [careful]