Search code examples
pythonback-testing

How to replace a pandas column row with the previous row if a condition is met


I'm trying to speed up my trading strategy backtesting.

Right now, I have

for i in trange(1, len(real_choice), disable=not backtesting, desc="Converting HOLDs and calculating backtest correct/incorrect... [3/3]"):
      if (advice[i] == "HOLD"):
        advice[i] = advice[i-1]
      if (real_choice[i] == "HOLD"):
        real_choice[i] = real_choice[i-1]

      if advice[i] == real_choice[i]:
        correct[i] = "CORRECT"
      else:
        correct[i] =  "INCORRECT"

This part of the code takes the longest, so I want to speed it up.

I'm learning Python so this was simple and worked but now I'm paying for it with how long the backtests take.

Is there a way to do this faster?


Solution

  • you can use np.where to compare two columns and assign a value to those rows

    correct = np.where( advice == real_choice
                         , "CORRECT", "INCORRECT)
    

    but to make it look more pandas it would be

    df['correct'] = np.where( df['advice'] == df['real_choice']
                         , "CORRECT", "INCORRECT)
    

    with some time comparisons (Full Code)

    A = randint(0, 10, 10000)
    
    B = randint(0, 10, 10000)
    
    df = pd.DataFrame({'A': A, 'B':B, 'C': "INCORRECT"})
    print(df)
    
    
    start = time.process_time()
    for i in range(0, len(real_choice)):
          if df['A'][i] == df['B'][i]:
            df['C'][i] = "CORRECT"
          else:
            df['C'][i] =  "INCORRECT"
    print("method 1", time.process_time() - start)
    
    
    start = time.process_time()
    df['C2'] = np.where( df['A'] == df['B'], "CORRECT", "INCORRECT")
    print("method 2", time.process_time() - start)
    
    

    method 2 took a shorter amount of time to compute

    method 1 1.0530679999999997
    method 2 0.0022619999999999862