Search code examples
python-3.xpandasnumpyapplysapply

For every element in a list a, how to count how many times it appear in one specific column in another dataframe


For every element in a dict a, I need to count how many times the element in 'age' column appears in one specific column of another dataframe in pandas For example , I have a dict below:

a={'age':[22,38,26],'no':[1,2,3]} 

and I have another dataframe with a few columns

TableB= {'name': ['Braund', 'Cummings', 'Heikkinen', 'Allen'],
 'age': [22,38,26,35,41,22,38],
 'fare': [7.25, 71.83, 0 , 8.05,7,6.05,6], 
 'survived?': [False, True, True, False, True, False, True]}

I would like to know how many times every element in dict a appears in the column 'age' in TableB. The result I expect is c={'age':[22,38,26],'count':[2,2,1]}

I have tried apply function but it does not work. It comes with syntax error, I'm new to Pandas, could anyone please help with that? Thank you!

 def myfunction(y):
    seriesObj = TableB.apply(lambda x: True if y in list(x) else False, axis=1)
    numOfRows = len(seriesObj[seriesObj == True].index)
    return numofRows
 c['age']=a['age']
 c['count']=a['age'].apply(myfunction)

I would like to know how many times every element in list a appears in the column 'age' in TableB. The result should be c={'age':[22,38,26],'count':[2,2,1]}


Solution

  • You can just use merging of data frames to filter out the values that don't appear in a and just count the values.

    import pandas as pd
    a={'age':[22,38,26],'no':[1,2,3]} 
    
    TableB= {'name': ['Braund', 'Cummings', 'Heikkinen', 'Allen', 'Jones', 'Davis', 'Smith'],
     'age': [22,38,26,35,41,22,38],
     'fare': [7.25, 71.83, 0 , 8.05,7,6.05,6], 
     'survived?': [False, True, True, False, True, False, True]}
    
    df_a = pd.DataFrame(a)
    df_tb = pd.DataFrame(TableB)
    (pd.merge(df_tb, df_a, on='age')['age']
       .value_counts()
       .rename('count')
       .rename_axis('age')
       .reset_index()
       .to_dict(orient='list'))
    {'age': [22, 38, 26], 'count': [2, 2, 1]}