Search code examples
pythonpandasdataframecount

Dataframe - Count the frequency of variables in three columns with similar(same) variables in


I have a dataframe, where I split the string variables in to new columns as they were in one column as strings and separated by a ','.

Now I would like to count the three columns and aggregate the frequency of each variable.

The below is the code I'm using to count the columns individually.

Split string into three columns

df6 = df5.join(
        df['Key Features'].str.split(',', expand=True).rename(
        columns={0:'Key Features A', 1:'Key Features B', 2:'Key Features C'}

      )
     )

Count Variables of individual columns

df8['Key Features A'].value_counts()
df8['Key Features B'].value_counts()
df8['Key Features C'].value_counts()

'''

Thanks!


Solution

  • If I understand what you require: each value_counts() is a Dictionary and you want to combine the 3 Dictionaries by adding the count values. If so the function show below does that with a simple example:

    import pandas as pd
    
    df= pd.DataFrame({'x': [1, 2, 3, 4, 2, 3, 1],
                      'y': [1, 2, 1, 4, 2, 5, 1],
                      'z': [2, 3, 4, 1, 1,  1, 1]
                      })
    
    x= df['x'].value_counts()
    y = df['y'].value_counts()
    z = df['z'].value_counts()
    
    def combine(*a):
        s = {}
        for b in a:
            for k, v in b.items():
                s[k] = s.get(k,0) + v
        return s
        
    print(combine(x, y, z))
    

    gives:

    {1: 9, 2: 5, 3: 3, 4: 3, 5: 1}
    

    which is Dictionary showing the total counts.