I have created the following pandas dataframe:
import pandas as pd
import numpy as np
ds = { 'trend' : [1,1,1,1,2,2,3,3,3,3,3,3,4,4,4,4,4], 'price' : [23,43,56,21,43,55,54,32,9,12,11,12,23,3,2,1,1]}
df = pd.DataFrame(data=ds)
The dataframe looks as follows:
display(df)
trend price
0 1 23
1 1 43
2 1 56
3 1 21
4 2 43
5 2 55
6 3 54
7 3 32
8 3 9
9 3 12
10 3 11
11 3 12
12 4 23
13 4 3
14 4 2
15 4 1
16 4 1
I have saved the dataframe to a .csv file called df.csv
:
df.to_csv("df.csv", index = False)
I need to create a new field called minimum
which:
price
observed at each iteration and the last price
observed in the previous trend
.For example:
Fast forward to record 4.
price
observed at record 4 (price: 43
) and the last price
observed for the previous trend
(price: 21
). The result is 21.Fast forward to record 14.
price
observed at record 14 (price: 2
) and the last price
observed for the previous trend
(price: 12
). The result is 2.And so on.
I have then written this code:
minimum = []
for i in range(len(df)):
ds = pd.read_csv("df.csv", nrows=i+1)
d = ds.groupby('trend', as_index=False).agg(
{'price':'last'})
d['minimum'] = d['price'].min()
minimum.append(d['minimum'].iloc[-1])
ds['minimum'] = minimum
The resulting dataframe looks as follows:
display(ds)
trend price minimum
0 1 23 23
1 1 43 43
2 1 56 56
3 1 21 21
4 2 43 21
5 2 55 21
6 3 54 21
7 3 32 21
8 3 9 9
9 3 12 12
10 3 11 11
11 3 12 12
12 4 23 12
13 4 3 3
14 4 2 2
15 4 1 1
16 4 1 1
The resulting dataframe is correct.
The problem is that I have to apply this process to a dataframe which contains about 1 million records and it will take about 48 years to complete.
Does anybody know a quicker way to obtain the same results above?
Aggregate, shift
, then map
and get the minimum
with numpy:
prev = (df.groupby('trend')['price'].min()
.shift(fill_value=np.inf)
)
df['minimum'] = np.minimum(df['price'], df['trend'].map(prev))
Output:
trend price minimum
0 1 23 23
1 1 43 43
2 1 56 56
3 1 21 21
4 2 43 21
5 2 55 21
6 3 54 43
7 3 32 32
8 3 9 9
9 3 12 12
10 3 11 11
11 3 12 12
12 4 23 9
13 4 3 3
14 4 2 2
15 4 1 1
16 4 1 1