So I have a Pandas dataframe with the following structure:
ClientId | Product | Quantity |
---|---|---|
01 | Apples | 2 |
01 | Oranges | 3 |
01 | Bananas | 1 |
02 | Apples | 4 |
02 | Bananas | 2 |
and would like to get this df to look like:
ClientId | Product_Apples | Quantity_Apples | Product_Oranges | Quantity_Oranges | Product_Bananas | Quantity_Bananas |
---|---|---|---|---|---|---|
01 | 1 | 2 | 1 | 3 | 1 | 1 |
02 | 1 | 4 | 0 | 0 | 1 | 2 |
where the columns starting with Product are binary variables.
What would be the Python code for this transformation?
Assuming your dataset is a pandas dataframe, you can try this:
import pandas as pd
df = pd.DataFrame({
'ClientID': ['01']*3 + ['02']*2,
'Product': ['Apples', 'Oranges', 'Bananas', 'Apples', 'Bananas'],
'Quantity': [2, 3, 1, 4, 2]
})
df['product_cnt'] = 1
# solution 1
df.pivot(index='ClientID', columns='Product', values=['Quantity', 'product_cnt']).fillna(0)
# solution 2
df_products = df.pivot(index='ClientID', columns='Product', values='product_cnt').fillna(0).add_prefix('Product_')
df_qty = df.pivot(index='ClientID', columns='Product', values='Quantity').fillna(0).add_prefix('Quantity_')
df_products.merge(df_qty, on='ClientID', how='left')