Search code examples
pythonpandasdataframestackreshape

Reshaping a dataframe every nth column


I have two datasets. After merging them horzontally, and sorting the columns with the following code, I get the dataset below:

df=

X Y
5.2 6.5
3.3 7.6

df_year=

X Y
2014 2014
2015 2015
df_all_cols = pd.concat([df, df_year], axis = 1)
sorted_columns = sorted(df_all_cols.columns)
df_all_cols_sort = df_all_cols[sorted_columns]
X X Y Y
5.2 2014 6.5 2014
3.3 2015 7.6 2015

I am trying to make my data look like this, by stacking the dataset every 2 columns.

name year Variable
5.2 2014 X
3.3 2015 X
6.5 2014 Y
7.6 2015 Y

Solution

  • One approach could be as follows:

    • Apply df.stack to both dfs before feeding them to pd.concat. The result at this stage being:
           0     1
    0 X  5.2  2014
      Y  6.5  2014
    1 X  3.3  2015
      Y  7.6  2015
    
    • Next, use df.sort_index to sort on the original column names (i.e. "X, Y", now appearing as index level 1), and get rid of index level 0 (df.droplevel).
    • Finally, use df.reset_index with drop=False to insert index as a column and rename all the columns with df.rename.
    res = (pd.concat([df.stack(),df_year.stack()], axis=1)
           .sort_index(level=1)
           .droplevel(0)
           .reset_index(drop=False)
           .rename(columns={'index':'Variable',0:'name',1:'year'})
           )
    
    # change the order of cols
    res = res.iloc[:, [1,2,0]]
    
    print(res)
    
       name  year Variable
    0   5.2  2014        X
    1   3.3  2015        X
    2   6.5  2014        Y
    3   7.6  2015        Y