Search code examples
pythonpython-3.xpandaspandas-groupbypandas-melt

Transpose or melt to group by columns?


I have this data frame:

df = pd.DataFrame({ "cluster" : ["A", "A", "A", "A", "A", "C", "C", "C", "C"], 
                    "col1": ["0", "0", "1", "1", "0", "1", "0", "1", "0"], 
                    "col2": ["1", "1", "1", "1", "0", "1", "1", "1", "0"] })

and I can't figure out the best way to have a data frame result with this format

col cluster avg
col1 A 0.4
col1 C 0.5
col2 A 0.8
col2 C 0.75

I tried using transpose, then melt the cluster, but didn't work. Any help would be highly appreciated!

My desired result is the table - where all columns (except cluster) are rows, and cluster is melt and the average is calculated for the values of the column for each cluster


Solution

  • Looks like a combination of melt and groupby + mean would do the trick. Note that the "col" values are strings in your input, so we also have to convert them to integers first before calculating the mean:

    out = (df.melt(['cluster'], var_name='col')
           .assign(value=lambda x: x['value'].astype(int))
           .groupby(['col', 'cluster'], as_index=False).mean())
    

    Output:

        col cluster  value
    0  col1       A   0.40
    1  col1       C   0.50
    2  col2       A   0.80
    3  col2       C   0.75