Search code examples
pythonpandasnumpypivotpivot-table

Calculating multiple colors and duplicate colors in a pivot table in pandas


I have a DataFrame df that contains information about orders, including the 'Webshop_Order', 'Category', 'Level', 'Class', 'USIM', 'Size', 'Color', 'Length', 'Demand_Qty', and 'Return_Qty' columns.

I want to create a pivot table that includes the number of orders for each size and color combination per USIM and Webshop_Order.

I would like to calculate some summary columns that I am able to do:

'multiple_sizes_in_transaction' 'duplicate_sizes_in_transaction'

Here's my current code:

# ...

# Pivot to create table of USIM/ORDER with the number of orders of each size
sizes_per_order = df.pivot_table(
    index=['USIM', 'Webshop_Order'],
    columns='Size',
    values='Demand_Qty',
    aggfunc='sum',
    fill_value=0,
)

sizes_per_order = sizes_per_order.assign(
    multiple_sizes_in_transaction=sizes_per_order.gt(0).sum(axis=1).gt(1),
    duplicate_sizes_in_transaction=sizes_per_order.gt(1).any(axis=1),
)

# ...

enter image description here However, I also want to include the 'Color' parameter in the pivot table and calculate the number of orders with

  1. multiple colors
  2. duplicate colors.

so, I decided to update the code

# Pivot to create table of USIM/ORDER with the number of orders of each size and color
sizes_per_order = df.pivot_table(
    index=['USIM', 'Webshop_Order'],
    columns=['Size', 'Color'],
    values='Demand_Qty',
    aggfunc='sum',
    fill_value=0,
)

sizes_per_order = sizes_per_order.assign(
    multiple_sizes_in_transaction=sizes_per_order.gt(0).sum(axis=1).gt(1),
    multiple_colors_in_transaction=sizes_per_order.astype(bool).sum(axis=1, level='Color').gt(1).all(axis=1),
    duplicate_sizes_in_transaction=sizes_per_order.gt(1).any(axis=1),
    duplicate_colors_in_transaction=sizes_per_order.astype(bool).sum(axis=1, level='Color').gt(1).any(axis=1)
)

sizes_per_order

enter image description here but the output is not visually good, it's difficult to read it well due to a lot of column values.

Could you please guide me on how to modify the code to achieve a nice representation of this?

Thank you in advance for your help!


Solution

  • Using the style element of the DataFrame, you can apply styling and create a nicer-looking representation of the pivot table. An revised version of your code with formatting options is provided here:

    import pandas as pd
    
    # Pivot to create table of USIM/ORDER with the number of orders of each size and color
    sizes_per_order = df.pivot_table(
        index=['USIM', 'Webshop_Order'],
        columns=['Size', 'Color'],
        values='Demand_Qty',
        aggfunc='sum',
        fill_value=0,
    )
    
    # Calculate summary columns
    multiple_sizes = sizes_per_order.gt(0).sum(axis=1).gt(1)
    multiple_colors = sizes_per_order.astype(bool).sum(axis=1, level='Color').gt(1).all(axis=1)
    duplicate_sizes = sizes_per_order.gt(1).any(axis=1)
    duplicate_colors = sizes_per_order.astype(bool).sum(axis=1, level='Color').gt(1).any(axis=1)
    
    # Combine summary columns into a new DataFrame
    summary_df = pd.DataFrame({
        'Multiple Sizes': multiple_sizes,
        'Multiple Colors': multiple_colors,
        'Duplicate Sizes': duplicate_sizes,
        'Duplicate Colors': duplicate_colors
    })
    
    # Merge the summary DataFrame with the original pivot table
    result = sizes_per_order.merge(summary_df, left_index=True, right_index=True)
    
    # Apply formatting to the DataFrame
    # Apply formatting to the DataFrame
    styled_result = result.astype(int).style.background_gradient(cmap='Blues', subset=pd.IndexSlice[:, :])
    
    # Display the styled DataFrame
    styled_result
    
    

    The background_gradient method in the code above applies a color gradient to the pivot table's cells, making it simpler to see patterns visually. If you'd prefer to utilize a different colormap, you can change the cmap parameter. To further alter the design of the table, you might investigate various formatting possibilities offered by the style attribute.

    If necessary, make sure to change 'Demand_Qty' in the pivot_table function to the name of the correct column.

    For more style customization, I‘d suggest reading the documentation.

    New Topic: 7000rows+

    A huge DataFrame with 7000 rows can require a lot of computing work and may take a long time to style. The background gradient calculation, which requires processing a lot of data, is one potential cause of the execution's lag. some ideas to improve the performance:

    Reduce the subset: Take into account choosing a subset of the data for style rather than applying the styling to the complete DataFrame. This can be done by entering the correct row and column indices in the background_gradient method's subset parameter. The processing time can be shortened by restricting the subset. Change the colormap: Compared to other colormaps, the Blues colormap used in the example might need more calculation.

    Use a simpler colormap that might be easier to compute, such cool or binary. Execute in chunks: You can divide the data into smaller chunks and apply styling to each chunk independently if decreasing the subset is not sufficient. In this manner, you can see a portion of the findings while the procedure is still running. Disable more styling choices: Disabling such settings may be able to speed up execution if the DataFrame has additional styling added, such as conditional formatting or other features. Keep in mind that the intricacy of the applied styling and the available processing resources have a significant impact on how well huge DataFrames style. If the performance is still slow after these modifications, you can think about trying another method of visualizing or summarizing data.