Search code examples
pythonpandasdataframedata-analysis

Most efficient way to determine average # ordered per item per day in Python


I am new to Python and have a large dataset (over 55,000 rows to be exact) in a dataframe with columns UPC, quantity_picked, and date. I am trying to determine the most efficient way to calculate the average quantity picked per UPC per day. Is there a way that this can be done more efficiently than using nested loops?

I have tried the .mean approach but of course it only provides the average quantity picked per UPC per order. I have also tried implementing .groupby().

practice_df_by_UPC = practice_df.groupby('UPC')
print(practice_df_by_UPC)

I receive no error but no output either.

I am wanting an output that will allow me to see the average number of items picked by UPC per day. Is there a way to do this for dataframes that I'm not finding that is more efficient, or is a set of nested loops the way to go?


Solution

  • After switching out groupby with set_index and doing some extra troubleshooting, the following code resolved the issue and accomplished the goal of determining the average quantity for a specified UPC and date (example UPC and date plugged in):

    #creating multi-index based on date and UPC
    practice_df.set_index(['date_expected', 'UPC'], inplace=True)
    
    print('Mean:')
    print(practice_df.loc['2019-05-15', '0000000004011'].mean())
    

    The inplace=True addition got rid of the remaining error. While printing the dataframe it is now multi indexed organized first by date, then by UPC.