Search code examples
pandasdataframegroup-byerror-handling

How to handle KeyError(key)


import pandas as pd

colnames = ['Date', 'Items', 'Quantity', 'Price']
df1 = pd.read_csv('data_assignment_1.txt',sep=" ",names=colnames, header=None)

print(df1)


Output:

         Date   Items  Quantity  Price

0  2020-09-23  Item_A         1    1.9
1  2020-09-23  Item_B         1    1.2
2  2020-09-23  Item_A         2    1.9
3  2020-09-23  Item_B         1    1.2
4  2020-09-24  Item_A         1    1.9
5  2020-09-24  Item_B         1    1.2
6  2020-09-24  Item_C         1    1.3
7  2020-09-25  Item_D         1    2.7

Now I groupby date and items to find the total Quantity:

groupby_date_item = df1.groupby(['Date', 'Items'])['Quantity'].sum()

print(groupby_date_item['2020-09-23','Item_A'])

Output result:
 3

Now the problem is if i put Item D with the date 2020-09-23 I will get an error:

print(groupby_date_item['2020-09-23','Item_D'])

Output result:
raise KeyError(key) from err
KeyError: ('2020-09-23', 'Item_d')

How do I handle the error if item does not exist on that date or input wrong date and item?


Solution

  • As you have discovered, you will get an Error if you try to retrieve a value that does not exist. One way to resolve this issue is to use a try-except clause.

    See example below - where I use a custom function to handle the data retrieval using a try-except clause. If the function generates an error, it will return 0.

    Code:

    import numpy as np
    import pandas as pd
    
    def get_Item(my_Date, my_Item, gb_date_item):
        try:
            total_sale = gb_date_item[my_Date, my_Item]
        except:
            total_sale = 0
            
        return f"Total Sale on: {my_Date} for {my_Item} : {total_sale}"
    
    df1 = pd.DataFrame({ 'Date': ['2020-09-23', '2020-09-23', '2020-09-23', '2020-09-23', '2020-09-24', '2020-09-24', '2020-09-24', '2020-09-25'],
                        'Items': ['Item_A', 'Item_B', 'Item_A', 'Item_B', 'Item_A', 'Item_B', 'Item_C', 'Item_D'],
                        'Quantity': [1, 1, 2, 1, 1, 1, 1, 1],
                        'Price': [1.9, 1.2, 1.9, 1.2, 1.9, 1.2, 1.3, 2.7]})
    
    groupby_date_item = df1.groupby(['Date', 'Items'])['Quantity'].sum()
    
    print(get_Item('2020-09-23', 'Item_A', groupby_date_item))
    print(get_Item('2020-09-23', 'Item_B', groupby_date_item))
    print(get_Item('2020-09-23', 'Item_D', groupby_date_item))
    
    

    Output:

    Total Sale on: 2020-09-23 for Item_A : 3
    Total Sale on: 2020-09-23 for Item_B : 2
    Total Sale on: 2020-09-23 for Item_D : 0
    

    Note:

    Replace 0 with whatever you want to show for when date is not found.

    except:
            total_sale = 0  # Replace this value