Search code examples
pythonpandasdataframereshapemulti-index

Slicing a DataFrame in pandas?


I have a problem with a df in pandas. Lets suppose I have this dataframe:

k = [1,2,3,4,5,6,7,8,9,10,11,12]

k = pd.DataFrame(k).T

Which is a 1x12 dataframe and I want to get a df with 4 columns of it, like k4:

k1 = pd.DataFrame([1,2,3,4])
k2 = pd.DataFrame( [5,6,7,8])
k3 =  pd.DataFrame([9,10,11,12])
frames = [k1,k2,k3]
k4 = pd.concat(frames, axis = 1).T

My original df is much larger than k but its number of columns is multiple of 4 and I want to slice it into a 4 columns df. I guess it could be something related to i%4 == 0 but I dont really know how to do it.

Thanks in advance.

I miss a problem. I should have trasposed k4. Sorry guys.

TO sum up, i have a large row with a len multiple of 4, much larger than 12:

    0   1   2   3   4   5   6   7   8   9   10  11
0   1   2   3   4   5   6   7   8   9  10  11  12

And I need to make a df with 4 columns, with a change of row on each 4 elements:

  0   1   2   3
0  1   2   3   4
0  5   6   7   8
0  9  10  11  12

Solution

  • You can create MultiIndex in columns first by floor divide and modulo and then use stack, for remove first level of MultiIndex of index add reset_index:

    k = [1,2,3,4,5,6,7,8,9,10,11,12]
    
    k = pd.DataFrame(k).T
    k.columns = [k.columns // 4, k.columns % 4]
    print (k)
       0           1           2            
       0  1  2  3  0  1  2  3  0   1   2   3
    0  1  2  3  4  5  6  7  8  9  10  11  12
    
    print (k.stack().reset_index(level=0, drop=True))
       0  1   2
    0  1  5   9
    1  2  6  10
    2  3  7  11
    3  4  8  12
    

    EDIT:

    Only need 0 for swap first level of MultiIndex, not default last level

    print (k.stack(0).reset_index(level=0, drop=True))
       0   1   2   3
    0  1   2   3   4
    1  5   6   7   8
    2  9  10  11  12
    

    Or swap modulo with floor dividing:

    k = [1,2,3,4,5,6,7,8,9,10,11,12]
    
    k = pd.DataFrame(k).T
    k.columns = [k.columns % 4, k.columns // 4]
    print (k)
       0  1  2  3  0  1  2  3  0   1   2   3
       0  0  0  0  1  1  1  1  2   2   2   2
    0  1  2  3  4  5  6  7  8  9  10  11  12
    
    print (k.stack().reset_index(level=0, drop=True))
       0   1   2   3
    0  1   2   3   4
    1  5   6   7   8
    2  9  10  11  12
    

    Another numpy solution with numpy.ndarray.reshape is faster:

    k = [1,2,3,4,5,6,7,8,9,10,11,12]
    
    print (pd.DataFrame(np.array(k).reshape(-1,4)))
       0   1   2   3
    0  1   2   3   4
    1  5   6   7   8
    2  9  10  11  12