python python-3.x pandas pandas-groupby pandas-melt

Transpose or melt to group by columns?

I have this data frame:

df = pd.DataFrame({ "cluster" : ["A", "A", "A", "A", "A", "C", "C", "C", "C"], 
                    "col1": ["0", "0", "1", "1", "0", "1", "0", "1", "0"], 
                    "col2": ["1", "1", "1", "1", "0", "1", "1", "1", "0"] })

and I can't figure out the best way to have a data frame result with this format

col	cluster	avg
col1	A	0.4
col1	C	0.5
col2	A	0.8
col2	C	0.75

I tried using transpose, then melt the cluster, but didn't work. Any help would be highly appreciated!

My desired result is the table - where all columns (except cluster) are rows, and cluster is melt and the average is calculated for the values of the column for each cluster

Solution

Looks like a combination of melt and groupby + mean would do the trick. Note that the "col" values are strings in your input, so we also have to convert them to integers first before calculating the mean:

out = (df.melt(['cluster'], var_name='col')
       .assign(value=lambda x: x['value'].astype(int))
       .groupby(['col', 'cluster'], as_index=False).mean())

Output:

    col cluster  value
0  col1       A   0.40
1  col1       C   0.50
2  col2       A   0.80
3  col2       C   0.75