Search code examples
python-3.xpandasdataframeindexingpivot-table

How to drop irrelevant indexes in multi index pivot pandas


I have 2 indexes say: year and month. And I’m using pivot to display the count of products.

Now, in my df, let’s say there are no records of month july and august in 2020. But the pivot will show these months and values 0. I don’t want the pivot to show these irrelevant rows (which are not present in the df) as they make my final pivot very long. How to reduce this?

Here is my example df:

df = pd.DataFrame({'Product Type': ['Fruits', 'Fruits', 'Vegetable', 'Vegetable', 'Vegetable', 'Vegetable', 'Fruits', 'Fruits', 'Vegetables', 'Cars', 'Cars', 'Cars', 'Bikes', 'Bikes'],
                   'Product': ['Apple', 'Banana', 'Apple', 'Apple', 'Brocoli', 'Carrot', 'Apple', 'Banana', 'Brocoli', 'BMW M3', 'BMW M3', 'BMW M3', 'Hayabusa', 'Hayabusa'],
                   'Amount': [4938, 3285, 4947, 1516, 2212, 3778, 1110, 4436, 1049, 494, 2818, 3737, 954, 4074],
                  })

And my code so far:

import pandas as pd
import numpy as np
df = pd.read_csv('try.csv')
bins = [0,1000,2000,5000,float(np.inf)]
labels = ['0-1000','1000-2000','2000-5000','5000+']
df['bins'] = pd.cut(df['Amount'],bins=bins, labels=labels, right=True)
pivot = df.pivot_table(index=['Product Type','Product'],columns='bins', aggfunc='size')
pivot.dropna(inplace=True)
pivot

Expected ouput:

Amount                 0-1000  1000-2000  2000-5000  5000+
Product Type Product                                      
Bikes        Hayabusa       1          0          1      0
Cars         BMW M3         1          0          2      0
Fruits       Apple          0          1          1      0
             Banana         0          0          2      0
Vegetable    Apple          0          1          1      0
             Brocoli        0          0          1      0
             Carrot         0          0          1      0
Vegetables   Brocoli        0          1          0      0

In the df, Bikes only contains 'hayabusa', which i want in my pivot's Bike category. How should I do this?


Solution

  • Use cut and crosstab:

    bins = [0, 1000, 2000, 5000, np.inf]
    labels = ['0-1000', '1000-2000', '2000-5000', '5000+']
    
    out = pd.crosstab([df['Product Type'], df['Product']],
                      pd.cut(df['Amount'], bins=bins, labels=labels)
                     ).reindex(columns=labels, fill_value=0)
    

    Output:

    Amount                 0-1000  1000-2000  2000-5000  5000+
    Product Type Product                                      
    Bikes        Hayabusa       1          0          1      0
    Cars         BMW M3         1          0          2      0
    Fruits       Apple          0          1          1      0
                 Banana         0          0          2      0
    Vegetable    Apple          0          1          1      0
                 Brocoli        0          0          1      0
                 Carrot         0          0          1      0
    Vegetables   Brocoli        0          1          0      0
    

    Used input:

    df = pd.DataFrame({'Product Type': ['Fruits', 'Fruits', 'Vegetable', 'Vegetable', 'Vegetable', 'Vegetable', 'Fruits', 'Fruits', 'Vegetables', 'Cars', 'Cars', 'Cars', 'Bikes', 'Bikes'],
                       'Product': ['Apple', 'Banana', 'Apple', 'Apple', 'Brocoli', 'Carrot', 'Apple', 'Banana', 'Brocoli', 'BMW M3', 'BMW M3', 'BMW M3', 'Hayabusa', 'Hayabusa'],
                       'Amount': [4938, 3285, 4947, 1516, 2212, 3778, 1110, 4436, 1049, 494, 2818, 3737, 954, 4074],
                      })