During the day, new investment possibilities are registered, but the results (lay
column) are only registered at midnight each day.
So let's assume this CSV
:
clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,
2023/01/04,A,B,
2023/01/04,A,B,
Until yesterday, 2023/01/03
, the sum of the lines that have the value A
in competition
and B
in market_name
, was +1.28
I only invest if it is above 0
, so during today, every time this combination of values comes, the answer will be True
to invest.
At the end of the day, when the lay values are registered, I look at the total result:
clock_now,competition,market_name,lay
2022/12/30,A,B,-1
2022/12/31,A,B,1.28
2023/01/01,A,B,-1
2023/01/02,A,B,1
2023/01/03,A,B,1
2023/01/04,A,B,-1
2023/01/04,A,B,-1
2023/01/04,A,B,-1
End of the day: -1,72
This means that tomorrow, if that same combination of values appears in the columns, I will not invest once because it will always be negative because it only calculates the values that it has until the previous day.
I'm trying to create a column to show where it was True and where it was False:
df = pd.read_csv('example.csv')
combinations = [['market_name', 'competition']]
for cbnt in combinations:
df['invest'] = (df.groupby(cbnt)['lay']
.apply(lambda s: s.cumsum().shift())
.gt(df['lay'])
)
df['cumulative'] = (df.groupby(cbnt)['lay']
.apply(lambda s: s.cumsum().shift())
)
print(df[['clock_now','invest','cumulative']])
But the result is this:
clock_now invest cumulative
0 2022/12/30 False NaN
1 2022/12/31 False -1.00
2 2023/01/01 True 0.28
3 2023/01/02 False -0.72
4 2023/01/03 False 0.28
5 2023/01/04 True 1.28
6 2023/01/04 True 0.28
7 2023/01/04 True -0.72
The expected result would be this:
clock_now invest cumulative
0 2022/12/30 False NaN
1 2022/12/31 False -1.00
2 2023/01/01 True 0.28
3 2023/01/02 False -0.72
4 2023/01/03 True 0.28
5 2023/01/04 True 1.28
6 2023/01/04 True 0.28
7 2023/01/04 True -1.72
How should I proceed so that cumsum
can understand that attention must be paid to maintaining a daily pattern according to the results of previous days?
Example Two:
clock_now,competition,market_name,lay
2022/08/09,A,B,-1.0
2022/08/12,A,B,1.28
2022/09/07,A,B,-1.0
2022/10/15,A,B,1.0
2022/10/15,A,B,-1.0
2022/11/20,A,B,1.0
Note that on 2022/10/15
, it is delivering one False
and one True
, so in fact it is not tracking according to the date which is how I want it to happen:
clock_now invest cumulative
0 2022/08/09 False NaN
1 2022/08/12 False -1.00
2 2022/09/07 True 0.28
3 2022/10/15 False -0.72
4 2022/10/15 True 0.28
5 2022/11/20 False -0.72
The correct would be always or all False
or all True
when on equal dates. Like this:
clock_now invest cumulative
0 2022/08/09 False NaN
1 2022/08/12 False -1.00
2 2022/09/07 True 0.28
3 2022/10/15 False -0.72
4 2022/10/15 False 0.28
5 2022/11/20 False -0.72
(df.join(
# Count market&competition specific cumsum for each row
# and join back with df
df.groupby(['market_name', 'competition']).lay.cumsum().rename('lay_cumsum') > 0
)
# Group by market&comp&date to get last cumsum within each day
.groupby(['market_name', 'competition', 'clock_now'])
# Get cumsum Series for each group
.lay_cumsum
# Getting last cumsum within group
.last()
# Group by market&comp
.groupby(['market_name', 'competition'])
# Shift by one to assign to each date prev date's cumsum
.shift(1)
.rename('lay_cumsum')
.reset_index()
# Merge back with original df
.merge(df, on=['clock_now', 'market_name', 'competition']))
This will output
market_name competition clock_now lay_cumsum lay
0 B A 2022/12/30 NaN -1.00
1 B A 2022/12/31 False 1.28
2 B A 2023/01/01 True -1.00
3 B A 2023/01/02 False 1.00
4 B A 2023/01/03 True 1.00
5 B A 2023/01/04 True -1.00
6 B A 2023/01/04 True -1.00
7 B A 2023/01/04 True -1.00