I am new to Python and have a large dataset (over 55,000 rows to be exact) in a dataframe with columns UPC, quantity_picked, and date. I am trying to determine the most efficient way to calculate the average quantity picked per UPC per day. Is there a way that this can be done more efficiently than using nested loops?
I have tried the .mean approach but of course it only provides the average quantity picked per UPC per order. I have also tried implementing .groupby().
practice_df_by_UPC = practice_df.groupby('UPC')
print(practice_df_by_UPC)
I receive no error but no output either.
I am wanting an output that will allow me to see the average number of items picked by UPC per day. Is there a way to do this for dataframes that I'm not finding that is more efficient, or is a set of nested loops the way to go?
After switching out groupby with set_index and doing some extra troubleshooting, the following code resolved the issue and accomplished the goal of determining the average quantity for a specified UPC and date (example UPC and date plugged in):
#creating multi-index based on date and UPC
practice_df.set_index(['date_expected', 'UPC'], inplace=True)
print('Mean:')
print(practice_df.loc['2019-05-15', '0000000004011'].mean())
The inplace=True addition got rid of the remaining error. While printing the dataframe it is now multi indexed organized first by date, then by UPC.