Search code examples
pythonfor-loopsimulationagent-based-modeling

Attribute change with variable number of time steps


I would like to simulate individual changes in growth and mortality for a variable number of days. My dataframe is formatted as follows...

    import pandas as pd

    data = {'unique_id':  ['2', '4', '5', '13'],
            'length': ['27.7', '30.2', '25.4', '29.1'],
            'no_fish': ['3195', '1894', '8', '2774'],
            'days_left': ['253', '253', '254', '256'],
            'growth': ['0.3898', '0.3414', '0.4080', '0.3839']
           }

    df = pd.DataFrame(data)

    print(df)

      unique_id length no_fish days_left  growth
    0         2   27.7    3195       253  0.3898
    1         4   30.2    1894       253  0.3414
    2         5   25.4       8       254  0.4080
    3        13   29.1    2774       256  0.3839
    

Ideally, I would like the initial length (i.e., length) to increase by the daily growth rate (i.e., growth) for each of the days remaining in the year (i.e., days_left).

    df['final'] = df['length'] + (df['days_left'] * df['growth']

However, I would also like to update the number of fish that each individual represents (i.e., no_fish) on a daily basis using a size-specific equation. I'm fairly new to python so I initially thought to use a for-loop (I'm not sure if there is another, more efficient way). My code is as follows:

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_to_forecast[indx]):
   
        # (1) update individual length
        df.lgth[indx] = df.lgth[indx] + df.linearGR[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.lgth[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.lgth[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.lgth[indx])
            elif df.lgth[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.lgth[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count + 1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time)) 

The above code now works correctly, but it is still far to inefficient to run for 40,000 individuals each for 200+ days.

I would really appreciate any advice on how to modify the following code to make it pythonic.

Thanks


Solution

  • As I said in my comment, a preferable alternative to for loops in this setting is using vector operations. For instance, running your code:

    import pandas as pd
    import time
    import math
    import numpy as np
    
    data = {'unique_id':  [2, 4, 5, 13],
            'length': [27.7, 30.2, 25.4, 29.1],
            'no_fish': [3195, 1894, 8, 2774],
            'days_left': [253, 253, 254, 256],
            'growth': [0.3898, 0.3414, 0.4080, 0.3839]
           }
    
    df = pd.DataFrame(data)
    
    print(df)
    
    # keep track of run time - START
    start_time = time.perf_counter()
    
    df['z'] = 0.0
    for indx in range(len(df)): 
        count = 1
        while count <= int(df.days_left[indx]):
       
            # (1) update individual length
            df.length[indx] = df.length[indx] + df.growth[indx]
        
            # (2) estimate daily size-specific mortality 
            if df.length[indx] > 50.0:
                df.z[indx] = 0.01
            else:
                if df.length[indx] <= 50.0:
                    df.z[indx] = 0.052857-((0.03/35)*df.length[indx])
                elif df.length[indx] < 15.0:
                    df.z[indx] = 0.728*math.exp(-0.1892*df.length[indx])
        
            df['no_fish'].round(decimals = 0)
            if df.no_fish[indx] < 1.0:
                df.no_fish[indx] = 0.0
            elif df.no_fish[indx] >= 1.0:
                df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
        
            # (3) reduce no. of days left in forecast by 1
            count = count + 1
    
    # keep track of run time - END
    total_elapsed_time = round(time.perf_counter() - start_time, 2)
    print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
    print(df)
    

    with output:

       unique_id  length  no_fish  days_left  growth
    0          2    27.7     3195        253  0.3898
    1          4    30.2     1894        253  0.3414
    2          5    25.4        8        254  0.4080
    3         13    29.1     2774        256  0.3839
    Forecast iteration completed in 31.75 seconds
       unique_id    length     no_fish  days_left  growth     z
    0          2  126.3194  148.729190        253  0.3898  0.01
    1          4  116.5742   93.018465        253  0.3414  0.01
    2          5  129.0320    0.000000        254  0.4080  0.01
    3         13  127.3784  132.864757        256  0.3839  0.01
    

    Now with vector operations, you could do something like:

    # keep track of run time - START
    start_time = time.perf_counter()
    df['z'] = 0.0
    for day in range(1, df.days_left.max() + 1):
        update = day <= df['days_left']
        # (1) update individual length
        df[update]['length'] = df[update]['length'] + df[update]['growth']
    
        # (2) estimate daily size-specific mortality
        df[update]['z'] = np.where( df[update]['length'] > 50.0, 0.01, 0.052857-( ( 0.03 / 35)*df[update]['length'] ) )
        df[update]['z'] = np.where( df[update]['length'] < 15.0, 0.728 * np.exp(-0.1892*df[update]['length'] ), df[update]['z'] )
                            
                        
                                
        df[update]['no_fish'].round(decimals = 0)
        df[update]['no_fish'] = np.where(df[update]['no_fish'] < 1.0, 0.0, df[update]['no_fish'] * np.exp(-(df[update]['z'])))                          
    # keep track of run time - END
    total_elapsed_time = round(time.perf_counter() - start_time, 2)
    print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
    print(df)    
    

    with output

    Forecast iteration completed in 1.32 seconds
       unique_id    length     no_fish  days_left  growth    z
    0          2  126.3194  148.729190        253  0.3898  0.0
    1          4  116.5742   93.018465        253  0.3414  0.0
    2          5  129.0320    0.000000        254  0.4080  0.0
    3         13  127.3784  132.864757        256  0.3839  0.0