My goal is to to calculate the moving average line for avg_price
column each customer.
The dataframe looks like this:
customer avg_price avg_price2 count1 count2 rate
date_time
2022-06-11 00:00:00 Customer1 4.4656 1.25 36 11084 0.003237
2022-06-11 00:00:00 Customer2 7.8873 0.92 10 22150 0.000451
2022-06-11 00:00:00 Customer3 2.3016 1.37 1 2521 0.000397
2022-06-11 00:00:00 Customer4 3.2421 1.05 221 98973 0.002228
2022-06-11 00:00:00 Customer5 1.0050 0.94 2 410 0.004854
... ... ... ... ... ... ...
2022-06-21 10:00:00 Customer1 4.9450 1.99 340 118000 0.002873
2022-06-21 10:00:00 Customer2 4.0643 2.06 268 20850 0.012691
2022-06-21 10:00:00 Customer3 3.7034 1.00 25 5100 0.004878
2022-06-21 10:00:00 Customer4 5.0367 2.64 2098 118251 0.017433
2022-06-21 10:00:00 Customer5 2.7429 1.57 50 11900 0.004184
Now, I already believe I found a way to do the calculation so I'm trying to figure out a way to assign the results of this group to the dataframe as a new column.
df.groupby('customer')['avg_price'].rolling(50).mean()
The results looks like this:
customer date_time
Customer1 2022-06-11 00:00:00 NaN
2022-06-11 06:44:00 NaN
2022-06-11 08:07:00 NaN
2022-06-11 08:10:00 NaN
2022-06-11 08:20:00 NaN
...
Customer2 2022-06-21 09:56:00 14.048820
2022-06-21 09:57:00 14.060702
2022-06-21 09:58:00 14.054652
2022-06-21 09:59:00 14.024310
2022-06-21 10:00:00 14.020486
Name: avg_price, Length: 228442, dtype: float64
When I tried adding this to the dataframe as a column, I got the following error:
df['ma_24h'] = df.groupby('customer' ['avg_price'].rolling(50).mean().reset_index(level=0,drop=True)
ValueError: cannot reindex from a duplicate axis
I want to add the calculated column, the results without customer to the existing dataframe so that it respects the already existing index which the date_time column.
I believe I found a way thanks to this post: How to include Moving Average with Pandas based on Values on other Columns
df['ma_24h'] = (df.groupby(['customer'])['avg_price']
.rolling('24H', min_periods=360)
.mean()
.sort_index(level=1)
.values)
I changed the rolling to 24H. The main issue I had was with the index and getting the values and apparently I had to sort the index and just select the values.
If there are other simpler ways to accomplish this, I'm open for them.