Search code examples
pythonpandasindexingmatchvlookup

Matching two dataframes and counting the times a matched row appeared in the first dataframe


I have two dataframes (df1 and df2).

df1 looks like this:

A     B      C  

Girl  25     APPLE
Boy   10     SAMSUNG
Girl    10    LG
Boy   5      Ap
Boy  68      SAM

df2 looks like this:

D           E        

APPLE      Ap 
SAMSUNG    Sam
LG         lg
GOOGLE     Go

I want to do index-match between these two so I can produce a new dataframe called df3.

if either column D or E (in df2) can be found in df1, it has to count them and reflect it on the newly produced dataframe, df3.

df3 should look like this:

A     Count

Girl   2
Boy    3
 

Solution

  • Filter with isin and count with value_counts. Optionally make the comparison case insensitive with str.casefold:

    out = (df1.loc[df1['C'].str.casefold()
                   .isin(df2[['D', 'E']].stack().str.casefold()),
                  'A']
              .value_counts()
              .reset_index(name='Count')
           )
    

    Output:

          A  Count
    0  Girl      2
    1   Boy      3