I have dataset with SKU IDs and their counts, i need to feed this data into a machine learning algorithm, in a way that SKU IDs become columns and COUNTs are at the intersection of transaction id and SKU ID. Can anyone suggest how to achieve this transformation.
CURRENT DATA
TransID SKUID COUNT
1 31 1
1 32 2
1 33 1
2 31 2
2 34 -1
DESIRED DATA
TransID 31 32 33 34
1 1 2 1 0
2 2 0 0 -1
In Pandas, you can use pivot:
>>> df.pivot('TransID', 'SKUID').fillna(0)
COUNT
SKUID 31 32 33 34
TransID
1 1 2 1 0
2 2 0 0 -1
To avoid ambiguity, it is best to explicitly label your variables:
df.pivot(index='TransID', columns='SKUID').fillna(0)
You can also perform a groupby
and then unstack SKUID
:
>>> df.groupby(['TransID', 'SKUID']).COUNT.sum().unstack('SKUID').fillna(0)
SKUID 31 32 33 34
TransID
1 1 2 1 0
2 2 0 0 -1