Search code examples
pythonpandaspython-itertools

More efficient iteration method


I m looking for a better way of doing the following. The code below works but its very slow because I m working with a large dataset. I was trying to also use itertools but somehow I couldnt make it work. So here my very unpythonic starting point.

Helper function:

def signalbin(x,y):
  if x > y:
      return 1
  else:
      return -1

Test Data:

np.random.seed(0)
df = pd.DataFrame(
    {
        'a': np.random.normal(0, 2.5, n),
        'b': np.random.normal(0, 2.5, n),
    }
)

My Current code:

df["signal"] = [signalbin(x, y) for x, y in zip(df["a"], df["b"])]
df["signal2"] = df["signal"]
for i, row in df.iterrows():
    if i == 0:
        continue

    if (row['signal2'] != df.at[i-1, "signal"]):
        df.at[i, "signal2"] = df.at[i-1, "signal2"]

In this case the column signal2 is the desired result.

So I m looking for a more efficient iteration logic that allows to put conditions on multiple columns and rows


Solution

  • The first part will depend on your real function; it might not be easy to improve it.

    The second part can be vectorized with shift, mask, and ffill.

    # vectorization of the dummy example
    # this might not be possible with a more complex function
    df['signal'] = np.where(df['a']>df['b'], 1, -1)
    
    # get previous row
    prev = df['signal'].shift(fill_value=df['signal'].iloc[0])
    
    # identify changing values, mask , ffill
    df['signal2'] = (df['signal'].mask(df['signal'].ne(prev)).ffill()
                     .astype(df['signal'].dtype) # optional
                    )
    

    Output:

              a         b  signal  signal2
    0  4.410131  0.360109       1        1
    1  1.000393  3.635684      -1        1
    2  2.446845  1.902594       1        1
    3  5.602233  0.304188       1        1
    4  4.668895  1.109658       1        1
    5 -2.443195  0.834186      -1        1
    6  2.375221  3.735198      -1       -1
    7 -0.378393 -0.512896       1       -1
    8 -0.258047  0.782669      -1       -1
    9  1.026496 -2.135239       1       -1