Dataframe data will be :
COL1 | COL2 | COL3 |
---|---|---|
100 | 200 | 300 |
101 | 201 | 301 |
102 | 202 | 302 |
103 | 203 | 303 |
Expected output: Format as HBase (Columnar fashion)
Consider COL1 as KEY_COLUMN vaue
KEY_COLUMN | KEY | VALUE |
---|---|---|
100 | COL2 | 200 |
100 | COL3 | 300 |
101 | COL2 | 201 |
101 | COL3 | 301 |
102 | COL2 | 202 |
102 | COL3 | 302 |
103 | COL2 | 203 |
103 | COL3 | 303 |
I tried using .loc which transpose/convert all key to value like below.
for idx in df.index:
print (df.loc[idx])
COL1 100
COL2 200
COL3 300
COL1 101
COL2 201
COL3 301
COL1 102
COL2 202
COL3 302
COL1 103
COL2 203
COL3 303
But I couldn't bring the KEY_COLUMN as column 1 and KEY and VALUE as subsequent columns. Could anyone please suggest. Thanks!
Try pd.melt
:
df = (
df.rename(columns={"COL1": "KEY_COLUMN"})
.melt("KEY_COLUMN", var_name="KEY", value_name="VALUE")
.sort_values(by="KEY_COLUMN")
)
print(df)
Prints:
KEY_COLUMN KEY VALUE
0 100 COL2 200
4 100 COL3 300
1 101 COL2 201
5 101 COL3 301
2 102 COL2 202
6 102 COL3 302
3 103 COL2 203
7 103 COL3 303