My dataframe is in the linked image. Basically to make it simple, my dataframe currently looks something like this:
Gene | Cell_A | Cell_B | Cell_B | Cell_B | Cell_A |
---|---|---|---|---|---|
Gene_A | 0 | 4 | 35.5 | 4.5 | 3.5 |
Gene_B | 1.3 | 52 | 3.4 | 2.4 | 0 |
Gene_C | 2.3 | 3.3 | 32 | 0 | 2 |
And there are 3105 columns of Cell_A and Cell_B combined. There are around 13k (I think?) rows of genes. What I want to do is get the average number per gene (row), grouped by the unique column name. So in the end I would have just 2 columns, Cell_A and Cell_B, with the average number (per gene, i.e. row) as data.
I expect that it has to do something with either agg or groupby. But I have no idea where to even start with this. If you can offer some guidance I would be very grateful!
You are right, you want to group by columns and do the mean
operation.
First, preserve the first column as an index:
df = df.set_index(['Gene'])
Then do
df.groupby(by=df.columns, axis=1).mean()