Search code examples
pythonpandaspandas-groupby

groupby DataFrame by N columns or N rows


I'd like to find a general solution to groupby a DataFrame by a specified amount of rows or columns. Example DataFrame:

df = pd.DataFrame(0, index=['a', 'b', 'c', 'd', 'e', 'f'], columns=['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7'])

   c1  c2  c3  c4  c5  c6  c7
a   0   0   0   0   0   0   0
b   0   0   0   0   0   0   0
c   0   0   0   0   0   0   0
d   0   0   0   0   0   0   0
e   0   0   0   0   0   0   0
f   0   0   0   0   0   0   0

For example I'd like to group by 2 rows a time and apply a function like mean or similar. I'd also like to know how to group by N columns a time and apply a function.

Group by 2 rows a time expected output:

   c1  c2  c3  c4  c5  c6  c7
0   0   0   0   0   0   0   0
1   0   0   0   0   0   0   0
2   0   0   0   0   0   0   0

Group by 2 columns a time expected output:

   0  1  2  3
a  0  0  0  0
b  0  0  0  0
c  0  0  0  0
d  0  0  0  0
e  0  0  0  0
f  0  0  0  0

Solution

  • This groups by N rows

    >>> N=2
    
    >>> df.groupby(np.arange(len(df.index))//N, axis=0).mean()
       c1  c2  c3  c4  c5  c6  c7
    0   0   0   0   0   0   0   0
    1   0   0   0   0   0   0   0
    2   0   0   0   0   0   0   0
    

    This groups by N columns

    >>> df.groupby(np.arange(len(df.columns))//N, axis=1).mean()
       0  1  2  3
    a  0  0  0  0
    b  0  0  0  0
    c  0  0  0  0
    d  0  0  0  0
    e  0  0  0  0
    f  0  0  0  0