I have sample dataframe as follows
I just want to mark all the rows where id == 11 preeceded by 2. If the id ==11 and does not preeced by 2 in the immediate rows I want to mark it as 0
Home" Date Time id Appliance Banana expected_output output_from the code
1 1/21/2017 1:30:00 11 Apple 0 1 1
2 1/21/2017 1:45:00 11 Apple 0 1 1
3 1/21/2017 2:00:00 11 Apple 0 1 1
4 1/21/2017 2:15:00 2 Banana 1 1 0
5 1/21/2017 2:30:00 2 Banana 0 0 0
6 1/21/2017 2:45:00 0 Orange 0 0 0
7 1/21/2017 3:00:00 1 Peach 0 0 0
8 1/21/2017 3:15:00 1 Peach 0 0 0
9 1/21/2017 3:30:00 3 Pineapple 0 0 0
10 1/21/2017 3:45:00 3 Pineapple 0 0 0
11 1/21/2017 4:00:00 11 Apple 0 0 1
12 1/21/2017 4:15:00 11 Apple 0 0 1
13 1/21/2017 4:30:00 11 Apple 0 0 1
14 1/21/2017 4:45:00 0 Orange 0 0 0
15 1/22/2017 3:30:00 1 Peach 0 0 0
16 1/22/2017 3:45:00 1 Peach 0 0 0
17 1/22/2017 4:00:00 3 Pineapple 0 0 0
18 1/22/2017 4:15:00 3 Pineapple 0 0 0
19 1/22/2017 4:30:00 11 Apple 0 1 1
20 1/22/2017 4:45:00 11 Apple 0 1 1
21 1/22/2017 5:00:00 11 Apple 0 1 1
22 1/22/2017 5:15:00 2 Banana 1 1 0
23 1/22/2017 5:30:00 2 Banana 1 0 0
what I have achieved so far
df['Banana'] = np.where((df['id']==2) & (df['id'].shift(+1)==11), 1,
0)
formatted_df['output_from the code'] = np.where((df['id']==11) & (df['id'].shift(-1)==2), 1,
np.where((df['id']==11) & (df['id'].shift(-1)==11), 1,
0))
is there a way to write np.where based on previous row
I have a utility library I maintain called haggis. It has two functions that convert an array of booleans to an array of run markers and vice versa called haggis.math.mask2runs
and haggis.math.runs2mask
, respectively. The functions are inverses of each other. mask2runs
works like this:
>>> mask2runs([0, 1, 0, 1, 1, 1, 0, 1, 1])
array([[1, 2],
[3, 6],
[7, 9]], dtype=int64)
The first column is the start indices of each run of True elements, and the second column is the exclusive end. You can use this directly:
# Grab locations of 11s
runs = mask2runs(df.id == 11)
# remove the ones that aren't followed by 2
runs = runs[df.id.iloc[np.clip(runs[:, 1], 0, len(df) - 1)] == 2]
# Convert remaining runs back to mask
df.output = runs2mask(runs, n=len(df))
You don't need to import haggis to use these functions. They are actually pretty simple, especially in this limited usecase. The source is here, but you can also summarize them as follows specifically for this problem:
def mask2runs(mask):
return np.flatnonzero(np.diff(np.r_[np.int8(0), mask.view(np.int8) np.int8(0)])).reshape(-1, 2)
def runs2mask(runs, n):
mask = np.zeros(n, dtype=bool)
view = mask.view(np.int8)
view[runs[:, 0]] = 1
view[runs[:-1, 1] if runs[-1, 1] == n else runs[:, 1]] = -1
np.cumsum(view, out=view)
return mask