Here's the Python code
t0 = df['Temperature'].iloc[0] # Dataframe df with column 'Temperature' is already given
df['DriftedTemp'] = None
for i in range(1,len(df)):
if(np.abs(df['Temperature'].iloc[i] - t0) > toffset): # toffset is a parameter that is given
df['DriftedTemp'].iloc[i] = df['Temperature'].iloc[i]
t0 = df['Temperature'].iloc[i]
It figures out the rows when the temperature drifted from the previously recorded value by more than "toffset", and updates the "DriftedTemp" column at that row with this new value, and "t0" as well to the "Temperature" at a point where the drift happens.
The issue with such codes is that the current value depends on the previous value when it was evaluated in a previous row. Vectorization treats each column as vectors so the changed state of previous rows do not get reflected through simple vectorization.
This can be implemented using a while loop and vectorization but I cannot think of a simple vectorization technique without any loops to accomplish the same task.
Vectorization might not be possible since the computation of drift depends on the previous state of drift having said that this is a good use case for using numba
basically create a function with the logic and then compile it with numba to achieve C like speeds.
import numba
@numba.njit
def drift(temperatures, toffset):
drift = np.full_like(temperatures, fill_value=np.nan, dtype='float')
for i, t in enumerate(temperatures):
if i == 0:
t0 = t
elif abs(t - t0) > toffset:
t0 = drift[i] = t
return drift
df['DriftedTemp'] = drift(df['Temperature'].to_numpy(), 2)