Data transformation for machine learning

I have dataset with SKU IDs and their counts, i need to feed this data into a machine learning algorithm, in a way that SKU IDs become columns and COUNTs are at the intersection of transaction id and SKU ID. Can anyone suggest how to achieve this transformation.

CURRENT DATA

TransID     SKUID      COUNT
1           31         1  
1           32         2 
1           33         1  
2           31         2  
2           34         -1

DESIRED DATA

TransID      31      32      33      34
  1          1        2      1       0
  2          2        0      0       -1

Solution

In Pandas, you can use pivot:

>>> df.pivot('TransID', 'SKUID').fillna(0)
        COUNT         
SKUID      31 32 33 34
TransID               
1           1  2  1  0
2           2  0  0 -1

To avoid ambiguity, it is best to explicitly label your variables:

df.pivot(index='TransID', columns='SKUID').fillna(0)

You can also perform a groupby and then unstack SKUID:

>>> df.groupby(['TransID', 'SKUID']).COUNT.sum().unstack('SKUID').fillna(0)
SKUID    31  32  33  34
TransID                
1         1   2   1   0
2         2   0   0  -1