How to randomly pick 3 variables in var1, groupby 'DATE', sum it, then do several simulation?
df =
DATE var1
2023-01-31 1
2023-01-31 2
2023-01-31 3
2023-01-31 4
2023-01-31 5
2023-02-28 6
2023-02-28 7
2023-02-28 8
2023-02-28 9
2023-02-28 10
Simulation 1 =
2023-01-31 = (1+3+5) = 9
2023-02-28 = (6+7+10) = 23
simulation 2
2023-01-31 = (1+2+5) = 8
2023-02-28 = (9+7+10) = 26
simulation n.......
let's say we do 10 simulation for instance
You can use groupby.agg
with sample
:
out = df.groupby('DATE').agg(lambda g: g.sample(n=3).sum())
Example output:
var1
DATE
2023-01-31 8
2023-02-28 27
If you want to repeat the process, use a loop:
N = 10
for i in range(N):
print(f'simulation {i+1}')
print(df.groupby('DATE').agg(lambda g: g.sample(n=3).sum()))
N = 10
query = 'DATE == "2023-01-31"'
out = pd.concat({i+1: df.query(query).groupby('DATE').agg(lambda g: g.sample(n=3).sum())
for i in range(N)
}, names=['simulation'])
Example output:
var1
simulation DATE
1 2023-01-31 8
2 2023-01-31 10
3 2023-01-31 12
4 2023-01-31 8
5 2023-01-31 9
6 2023-01-31 10
7 2023-01-31 11
8 2023-01-31 12
9 2023-01-31 10
10 2023-01-31 6