I need some way to check row by row for differences in the "hour" column so that when df1 "hour" data skips an hour, that this row will be deleted in df2. Then, ultimately the length of df2 after the extra rows are deleted, will match the length of df1. I have tried using isin but it is not doing the job for me possibly because the hours are repeating for each new day. I have 2 df's, df1, and df2. df1 looks like this:
plant_name wind_speed_obs hour day month year
0 BIG HORN I 4.354742 1 1 1 2018
1 BIG HORN I 4.493089 2 1 1 2018
2 BIG HORN I 3.270214 3 1 1 2018
3 BIG HORN I 2.201387 4 1 1 2018
4 BIG HORN I 1.107117 5 1 1 2018
5 BIG HORN I 0.653544 6 1 1 2018
6 BIG HORN I 0.437724 7 1 1 2018
7 BIG HORN I 1.039667 8 1 1 2018
8 BIG HORN I 0.859894 9 1 1 2018
9 BIG HORN I 0.984382 10 1 1 2018
10 BIG HORN I 0.867333 11 1 1 2018
11 BIG HORN I 0.651906 12 1 1 2018
12 BIG HORN I 0.707006 13 1 1 2018
13 BIG HORN I 0.794844 14 1 1 2018
14 BIG HORN I 0.808548 15 1 1 2018
15 BIG HORN I 0.631703 16 1 1 2018
16 BIG HORN I 0.662685 17 1 1 2018
17 BIG HORN I 0.792321 18 1 1 2018
18 BIG HORN I 0.996753 19 1 1 2018
19 BIG HORN I 1.177580 20 1 1 2018
20 BIG HORN I 1.608482 21 1 1 2018
21 BIG HORN I 1.964004 22 1 1 2018
22 BIG HORN I 1.695751 23 1 1 2018
24 BIG HORN I 2.244386 1 2 1 2018
25 BIG HORN I 3.111387 2 2 1 2018
and df2 looks like this:
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
23 BIG HORN I 2.2 0 2 1 2018
24 BIG HORN I 3.9 1 2 1 2018
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018
after finding the unmatched hour in column "hour" of df2 (see index = 23 above) which has a "0" hour row that is not found in df1, the df2 dataframe should look like this with the "0" hour row removed: new df2: -- thank you!
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
24 BIG HORN I 3.9 1 2 1 2018
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018
use isin
for all date/time columns:
df2 = df2[df2['hour'].isin(df1['hour']) &
df2['day'].isin(df1['day']) &
df2['month'].isin(df1['month']) &
df2['year'].isin(df1['year'])]
df2
Out[1]:
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
24 BIG HORN I 3.9 1 2 1 2018 #row with index of 23 removed
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018