Search code examples
pythonpandasdataframedata-analysis

Mapping keywords with a dataframe columns using pandas in python


I have a dataframe,

DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player
ganesh  1       good driver

and a list,

my_list=["one","driver"]

I tried, names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name']

achieved everything except the keyvalue column.

 output_DF.
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
Ram     1       Ram is one of the good cricket player

My desired output is,
desired_DF,
Name    Stage   Description                                 keyvalue
Sri     1       Sri is one of the good singer in this two    one
        2       Thanks for reading                           
Ram     1       Ram is one of the good cricket player        one
ganesh  1       good driver                                  driver

some one help me with generating keyvalue column


Solution

  • I think you can use previous solution from here and then extract:

    pat = "|".join(my_list)
    
    df['keyvalue'] = df['Description'].str.extract("(" + pat + ')', expand=False).fillna('')
    print (df)
         Name  Stage                                Description keyvalue
    0     Sri      1  Sri is one of the good singer in this two      one
    1     Sri      2                         Thanks for reading         
    2     Ram      1      Ram is one of the good cricket player      one
    3  ganesh      1                                good driver   driver
    

    All together:

    print (df)
    #     Name  Stage                                Description
    #0     Sri      1  Sri is one of the good singer in this two
    #1              2                         Thanks for reading
    #2     Ram      1      Ram is one of the good cricket player
    #3  ganesh      1                            good Driver one
    
    my_list=["ONE","driver"]
    df['Name'] = df['Name'].mask(df['Name'].str.strip() == '').ffill()
    
    pat = "|".join(my_list).lower()
    
    names=df.loc[df["Description"].str.lower().str.contains(pat,na=False), 'Name']
    
    df = df[df['Name'].isin(names)]
    
    df['keyvalue'] = (df['Description'].str.lower()
                                       .str.extract("(" + pat + ')', expand=False)
                                       .fillna(''))
    print (df)
    #     Name  Stage                                Description keyvalue
    #0     Sri      1  Sri is one of the good singer in this two      one
    #1     Sri      2                         Thanks for reading         
    #2     Ram      1      Ram is one of the good cricket player      one
    #3  ganesh      1                            good Driver one   driver