Search code examples
pythondataframepandas-loc

Python Dataframe to Columnar format for accessing the columns dynamically


Dataframe data will be :

COL1 COL2 COL3
100 200 300
101 201 301
102 202 302
103 203 303

Expected output: Format as HBase (Columnar fashion)

Consider COL1 as KEY_COLUMN vaue

KEY_COLUMN KEY VALUE
100 COL2 200
100 COL3 300
101 COL2 201
101 COL3 301
102 COL2 202
102 COL3 302
103 COL2 203
103 COL3 303

I tried using .loc which transpose/convert all key to value like below.

for idx in df.index:
     print (df.loc[idx])
COL1  100
COL2  200
COL3  300
COL1  101
COL2  201
COL3  301
COL1  102
COL2  202
COL3  302
COL1  103
COL2  203
COL3  303

But I couldn't bring the KEY_COLUMN as column 1 and KEY and VALUE as subsequent columns. Could anyone please suggest. Thanks!


Solution

  • Try pd.melt:

    df = (
        df.rename(columns={"COL1": "KEY_COLUMN"})
        .melt("KEY_COLUMN", var_name="KEY", value_name="VALUE")
        .sort_values(by="KEY_COLUMN")
    )
    print(df)
    

    Prints:

       KEY_COLUMN   KEY  VALUE
    0         100  COL2    200
    4         100  COL3    300
    1         101  COL2    201
    5         101  COL3    301
    2         102  COL2    202
    6         102  COL3    302
    3         103  COL2    203
    7         103  COL3    303