Search code examples
pythonpandasdataframetransformation

Transpose rows into columns and create new columns in pandas


Run the code to create the obfuscated data:

from itertools import product

import numpy as np
import pandas as pd


a = [1, 2, 3]
b = [20, 40, 80, 160]
c = ['A', 'B']
d = [10, 20, 30, 40]
e = ['Alpha', 'Beta']
f = ['X', 'Y', 'Z']
i = ['a', 'b', 'c', 'd', 'e', 'f']

df = pd.DataFrame(product(a, b, c, d, e, f, i),
                  columns=['col_A', 'col_B', 'col_C', 'col_D', 'col_E', 'col_F', 'col_G'])

df['col_H'] = np.random.randint(0, 100, size=df.shape[0])
df['col_I'] = np.random.randint(0, 100, size=df.shape[0])
df['col_J'] = np.random.randint(0, 100, size=df.shape[0])

print(df)

I get the following table:

      col_A  col_B col_C  col_D  col_E col_F col_G  col_H  col_I  col_J
0         1     20     A     10  Alpha     X     a     44     55     12
1         1     20     A     10  Alpha     X     b     50     15     46
2         1     20     A     10  Alpha     X     c     42     36     46
3         1     20     A     10  Alpha     X     d     71     19     61
4         1     20     A     10  Alpha     X     e     74      1      7
5         1     20     A     10  Alpha     X     f     11     99      2
...     ...    ...   ...    ...    ...   ...   ...    ...    ...    ...
3450      3    160     B     40   Beta     Z     a     47     81     58
3451      3    160     B     40   Beta     Z     b     63     82     26
3452      3    160     B     40   Beta     Z     c     37     28     64
3453      3    160     B     40   Beta     Z     d     54     60     91
3454      3    160     B     40   Beta     Z     e     16     14     55
3455      3    160     B     40   Beta     Z     f     40     18     97

[3456 rows x 10 columns]

I want to transpose the last 4 columns, set the values of first column as headers, and add a new column based on the remaining 3 columns names.

See below:

      col_A  col_B col_C  col_D  col_E col_F new_col   a   b   c   d   e   f
0         1     20     A     10  Alpha     X   col_H  68  89   0  99  50  11
1         1     20     A     10  Alpha     X   col_I  27  43  20  45  97  99
2         1     20     A     10  Alpha     X   col_J  40  43  36  89  90   2
...     ...    ...   ...    ...    ...   ...     ... ... ... ... ... ... ...
1150      3    160     B     40   Beta     Z   col_H  59  50  95  31  32  68
1151      3    160     B     40   Beta     Z   col_I  14  94  25  20  46  90
1152      3    160     B     40   Beta     Z   col_J  87  32  34  99  91  90

I've tried doing it with melt, pivot, and pivot_table, but I can't figure it out.


Solution

  • Here's one way via set_index / stack / unstack:

    df = df.set_index(df.columns[:-3].to_list()).stack().unstack(-2).reset_index()
    

    OUTPUT:

    print(df.head(10))
    
    col_G  col_A  col_B col_C  col_D  col_E col_F level_6   a   b   c   d   e   f
    0          1     20     A     10  Alpha     X   col_H  97  96  59  71  98  19
    1          1     20     A     10  Alpha     X   col_I   0  55   8  26  70  85
    2          1     20     A     10  Alpha     X   col_J  90  87  16  70  68  39
    3          1     20     A     10  Alpha     Y   col_H  65  69   9  92  35  11
    4          1     20     A     10  Alpha     Y   col_I  28  47  56  83  43  81
    5          1     20     A     10  Alpha     Y   col_J   8  29  89  47  84  32
    6          1     20     A     10  Alpha     Z   col_H  97  12  95  17  34  47
    7          1     20     A     10  Alpha     Z   col_I  78  64  51  81  15  62
    8          1     20     A     10  Alpha     Z   col_J  47  83  58   3  90  11
    9          1     20     A     10   Beta     X   col_H  19   8  92  69  65  32