Search code examples
pythonpandasgroup-by

Pandas : groupby() different columns in a single step


I have a dataframe as below. "feature1", "feature2" and "feature3" are independent features. The possible values in these features as 1, 2, 3, 4 only. I need to find the size, grouped by the values for each features. Shown below is how I'm doing it now, for each feature separately. Is there a way to do it in a single step or more optimal way ? Thank you.

import pandas as pd
df = pd.DataFrame({'State'    : ['AZ', 'FL', 'AZ', 'FL', 'FL', 'FL', 'AZ', 'FL', 'AZ', 'AZ', 'FL', 'AZ', 'FL', 'AZ', 'AZ'],
               'feature1' : [1, 3, 2, 2, 4, 3, 3, 1, 2, 4, 3, 1, 1, 1, 1],
               'feature2' : [1, 2, 1, 2, 3, 2, 4, 1, 2, 4, 3, 1, 1, 1, 1],
               'feature3' : [1, 4, 1, 2, 1, 4, 3, 3, 2, 4, 3, 1, 1, 1, 1],})
(
    df
    .groupby(['State', 'feature1'])
    ['feature1']
    .size()
    .to_frame('N')
 )

Output for feature1:

enter image description here


Solution

  • Option 1: melt + value_counts

    df.melt('State', var_name='feature').value_counts().sort_index()
    

    Result

    State  feature   value
    AZ     feature1  1        4
                     2        2
                     3        1
                     4        1
           feature2  1        5
                     2        1
                     4        2
           feature3  1        5
                     2        1
                     3        1
                     4        1
    FL     feature1  1        2
                     2        1
                     3        3
                     4        1
           feature2  1        2
                     2        3
                     3        2
           feature3  1        2
                     2        1
                     3        2
                     4        2
    dtype: int64
    

    Option 2: melt + crosstab

    s = df.melt('State', var_name='feature')
    pd.crosstab(s['State'], [s['feature'], s['value']])
    

    Result

    feature        feature1          feature2          feature3         
    value          1  2  3  4        1  2  3  4        1  2  3  4
    State                                                        
    AZ             4  2  1  1        5  1  0  2        5  1  1  1
    FL             2  1  3  1        2  3  2  0        2  1  2  2