Search code examples
pythonpandasdataframepivot-table

Create new columns from information on rows in Python


So I have a Pandas dataframe with the following structure:

ClientId Product Quantity
01 Apples 2
01 Oranges 3
01 Bananas 1
02 Apples 4
02 Bananas 2

and would like to get this df to look like:

ClientId Product_Apples Quantity_Apples Product_Oranges Quantity_Oranges Product_Bananas Quantity_Bananas
01 1 2 1 3 1 1
02 1 4 0 0 1 2

where the columns starting with Product are binary variables.

What would be the Python code for this transformation?


Solution

  • Assuming your dataset is a pandas dataframe, you can try this:

    import pandas as pd
    
    df = pd.DataFrame({
        'ClientID': ['01']*3 + ['02']*2,
        'Product': ['Apples', 'Oranges', 'Bananas', 'Apples', 'Bananas'],
        'Quantity': [2, 3, 1, 4, 2]
    })
    
    df['product_cnt'] = 1
    
    # solution 1
    df.pivot(index='ClientID', columns='Product', values=['Quantity', 'product_cnt']).fillna(0)
    
    # solution 2
    df_products = df.pivot(index='ClientID', columns='Product', values='product_cnt').fillna(0).add_prefix('Product_')
    df_qty = df.pivot(index='ClientID', columns='Product', values='Quantity').fillna(0).add_prefix('Quantity_')
    df_products.merge(df_qty, on='ClientID', how='left')