Search code examples
pythonpandastransposemelt

How to Transpose Repeating Columns to Rows Pandas


I have a df1 that looks like this:

year    site    1   2   3   year    site    1   2   3   year    site    1   2   3   year    site    1   2   3
1991    A   4.1 5.9 4.1 1991    B   3.3 4.1 4.1 1991    C   4.1 0.6 4.1 1991    D   4.1 4.1 4.1
1992    A   6.2 5.7 6.2 1992    B   6.2 7.1 6.2 1992    C   6.2 6.2 6.2 1992    D   6.2 9.5 7.4
1993    A   2.6 1.9 4.7 1993    B   2.6 6.2 2.6 1993    C   5.4 8.3 2.6 1993    D   0.4 2.6 2.6

And, I am having trouble transposing the columns - which are months (1,2,3) - to rows for each site so that my reshaped df1 or df2 looks like this:

year    month   Site A  Site B  Site C  Site D
1991    1       4.1     3.3     4.1     4.1
1991    2       5.9     4.1     0.6     4.1
1991    3       4.1     4.1     4.1     4.1
1992    1       6.2     6.2     6.2     6.2
1992    2       5.7     7.1     6.2     9.5
1992    3       6.2     6.2     6.2     7.4
1993    1       2.6     2.6     5.4     0.4
1993    2       1.9     6.2     8.3     2.6
1993    3       4.7     2.6     2.6     2.6

I have tried using 'melt' and 'stack' but I do not understand how to reference the repeating months (1,2,3). Thank you,


Solution

  • Try the following using hard index slicing and reshaping:

    #Create input dataframe
    np.random.seed(0)
    df = pd.concat([pd.DataFrame({'year':[1991, 1992, 1993],
                      'site':[i]*3,
                      1:np.round(np.random.randint(2,8,3)+np.random.random(3),1),
                      2:np.round(np.random.randint(2,8,3)+np.random.random(3),1),
                      3:np.round(np.random.randint(2,8,3)+np.random.random(3),1)}) for i in [*'ABC']], axis=1)
    
    # index slice columns of the dataframe
    df_out = pd.concat([df.iloc[:,i:i+5] for i in range(0,df.shape[1],5)])
    
    # Reshape with melt, set_index, and unstack
    df_out =  df_out.melt(['year', 'site'], var_name='month')\
          .set_index(['year', 'month', 'site'])['value']\
          .unstack('site').add_prefix('Site ')\
          .reset_index()
    
    print(df_out)
    

    Output:

    site  year  month  Site A  Site B  Site C
    0     1991      1     6.6     6.0     5.5
    1     1991      2     7.3     5.5     7.6
    2     1991      3     3.9     2.5     4.7
    3     1992      1     7.5     2.1     5.6
    4     1992      2     4.1     2.8     7.9
    5     1992      3     2.1     3.8     2.1
    6     1993      1     2.4     5.9     4.0
    7     1993      2     6.3     3.1     2.7
    8     1993      3     3.1     3.1     3.7