Search code examples
pandasnumpyrandom

random pick item or value based on groupby date


How to randomly pick 3 variables in var1, groupby 'DATE', sum it, then do several simulation?

df =
             
    DATE       var1 
    2023-01-31  1
    2023-01-31  2
    2023-01-31  3
    2023-01-31  4
    2023-01-31  5
    2023-02-28  6
    2023-02-28  7
    2023-02-28  8
    2023-02-28  9
    2023-02-28  10

    Simulation 1 =
    2023-01-31 = (1+3+5) = 9
    2023-02-28 = (6+7+10) = 23

    simulation 2
    2023-01-31 = (1+2+5) = 8
    2023-02-28 = (9+7+10) = 26
    
    simulation n.......

let's say we do 10 simulation for instance


Solution

  • You can use groupby.agg with sample:

    out = df.groupby('DATE').agg(lambda g: g.sample(n=3).sum())
    

    Example output:

                var1
    DATE            
    2023-01-31     8
    2023-02-28    27
    

    If you want to repeat the process, use a loop:

    N = 10
    
    for i in range(N):
        print(f'simulation {i+1}')
        print(df.groupby('DATE').agg(lambda g: g.sample(n=3).sum()))
    
    create a DataFrame from the repeated sampling:
    N = 10
    query = 'DATE == "2023-01-31"'
    
    out = pd.concat({i+1: df.query(query).groupby('DATE').agg(lambda g: g.sample(n=3).sum())
                     for i in range(N)
                     }, names=['simulation'])
    

    Example output:

                           var1
    simulation DATE            
    1          2023-01-31     8
    2          2023-01-31    10
    3          2023-01-31    12
    4          2023-01-31     8
    5          2023-01-31     9
    6          2023-01-31    10
    7          2023-01-31    11
    8          2023-01-31    12
    9          2023-01-31    10
    10         2023-01-31     6