Search code examples
pandasmatchrowsdelete-row

2 Pandas - Find Unmatched Rows and Delete Extra Rows Not Matching Hour


I need some way to check row by row for differences in the "hour" column so that when df1 "hour" data skips an hour, that this row will be deleted in df2. Then, ultimately the length of df2 after the extra rows are deleted, will match the length of df1. I have tried using isin but it is not doing the job for me possibly because the hours are repeating for each new day. I have 2 df's, df1, and df2. df1 looks like this:

    plant_name  wind_speed_obs  hour  day  month  year
0   BIG HORN I        4.354742     1    1      1  2018
1   BIG HORN I        4.493089     2    1      1  2018
2   BIG HORN I        3.270214     3    1      1  2018
3   BIG HORN I        2.201387     4    1      1  2018
4   BIG HORN I        1.107117     5    1      1  2018
5   BIG HORN I        0.653544     6    1      1  2018
6   BIG HORN I        0.437724     7    1      1  2018
7   BIG HORN I        1.039667     8    1      1  2018
8   BIG HORN I        0.859894     9    1      1  2018
9   BIG HORN I        0.984382    10    1      1  2018
10  BIG HORN I        0.867333    11    1      1  2018
11  BIG HORN I        0.651906    12    1      1  2018
12  BIG HORN I        0.707006    13    1      1  2018
13  BIG HORN I        0.794844    14    1      1  2018
14  BIG HORN I        0.808548    15    1      1  2018
15  BIG HORN I        0.631703    16    1      1  2018
16  BIG HORN I        0.662685    17    1      1  2018
17  BIG HORN I        0.792321    18    1      1  2018
18  BIG HORN I        0.996753    19    1      1  2018
19  BIG HORN I        1.177580    20    1      1  2018
20  BIG HORN I        1.608482    21    1      1  2018
21  BIG HORN I        1.964004    22    1      1  2018
22  BIG HORN I        1.695751    23    1      1  2018
24  BIG HORN I        2.244386     1    2      1  2018
25  BIG HORN I        3.111387     2    2      1  2018

and df2 looks like this:

    plant_name  wind_speed_ms  hour  day  month  year
0   BIG HORN I            3.6     1    1      1  2018
1   BIG HORN I            3.1     2    1      1  2018
2   BIG HORN I            3.1     3    1      1  2018
3   BIG HORN I            2.0     4    1      1  2018
4   BIG HORN I            1.6     5    1      1  2018
5   BIG HORN I            0.8     6    1      1  2018
6   BIG HORN I            0.8     7    1      1  2018
7   BIG HORN I            1.0     8    1      1  2018
8   BIG HORN I            0.3     9    1      1  2018
9   BIG HORN I            0.1    10    1      1  2018
10  BIG HORN I            1.1    11    1      1  2018
11  BIG HORN I            1.9    12    1      1  2018
12  BIG HORN I            1.9    13    1      1  2018
13  BIG HORN I            1.0    14    1      1  2018
14  BIG HORN I            0.7    15    1      1  2018
15  BIG HORN I            2.1    16    1      1  2018
16  BIG HORN I            3.5    17    1      1  2018
17  BIG HORN I            2.1    18    1      1  2018
18  BIG HORN I            1.3    19    1      1  2018
19  BIG HORN I            2.3    20    1      1  2018
20  BIG HORN I            2.8    21    1      1  2018
21  BIG HORN I            3.0    22    1      1  2018
22  BIG HORN I            2.5    23    1      1  2018
23  BIG HORN I            2.2     0    2      1  2018
24  BIG HORN I            3.9     1    2      1  2018
25  BIG HORN I            4.3     2    2      1  2018
26  BIG HORN I            3.5     3    2      1  2018

after finding the unmatched hour in column "hour" of df2 (see index = 23 above) which has a "0" hour row that is not found in df1, the df2 dataframe should look like this with the "0" hour row removed: new df2: -- thank you!

    plant_name  wind_speed_ms  hour  day  month  year
0   BIG HORN I            3.6     1    1      1  2018
1   BIG HORN I            3.1     2    1      1  2018
2   BIG HORN I            3.1     3    1      1  2018
3   BIG HORN I            2.0     4    1      1  2018
4   BIG HORN I            1.6     5    1      1  2018
5   BIG HORN I            0.8     6    1      1  2018
6   BIG HORN I            0.8     7    1      1  2018
7   BIG HORN I            1.0     8    1      1  2018
8   BIG HORN I            0.3     9    1      1  2018
9   BIG HORN I            0.1    10    1      1  2018
10  BIG HORN I            1.1    11    1      1  2018
11  BIG HORN I            1.9    12    1      1  2018
12  BIG HORN I            1.9    13    1      1  2018
13  BIG HORN I            1.0    14    1      1  2018
14  BIG HORN I            0.7    15    1      1  2018
15  BIG HORN I            2.1    16    1      1  2018
16  BIG HORN I            3.5    17    1      1  2018
17  BIG HORN I            2.1    18    1      1  2018
18  BIG HORN I            1.3    19    1      1  2018
19  BIG HORN I            2.3    20    1      1  2018
20  BIG HORN I            2.8    21    1      1  2018
21  BIG HORN I            3.0    22    1      1  2018
22  BIG HORN I            2.5    23    1      1  2018
24  BIG HORN I            3.9     1    2      1  2018
25  BIG HORN I            4.3     2    2      1  2018
26  BIG HORN I            3.5     3    2      1  2018

Solution

  • use isin for all date/time columns:

    df2 = df2[df2['hour'].isin(df1['hour']) &
              df2['day'].isin(df1['day']) &
              df2['month'].isin(df1['month']) & 
              df2['year'].isin(df1['year'])]
    df2
    Out[1]: 
        plant_name  wind_speed_ms  hour  day  month  year
    0   BIG HORN I            3.6     1    1      1  2018
    1   BIG HORN I            3.1     2    1      1  2018
    2   BIG HORN I            3.1     3    1      1  2018
    3   BIG HORN I            2.0     4    1      1  2018
    4   BIG HORN I            1.6     5    1      1  2018
    5   BIG HORN I            0.8     6    1      1  2018
    6   BIG HORN I            0.8     7    1      1  2018
    7   BIG HORN I            1.0     8    1      1  2018
    8   BIG HORN I            0.3     9    1      1  2018
    9   BIG HORN I            0.1    10    1      1  2018
    10  BIG HORN I            1.1    11    1      1  2018
    11  BIG HORN I            1.9    12    1      1  2018
    12  BIG HORN I            1.9    13    1      1  2018
    13  BIG HORN I            1.0    14    1      1  2018
    14  BIG HORN I            0.7    15    1      1  2018
    15  BIG HORN I            2.1    16    1      1  2018
    16  BIG HORN I            3.5    17    1      1  2018
    17  BIG HORN I            2.1    18    1      1  2018
    18  BIG HORN I            1.3    19    1      1  2018
    19  BIG HORN I            2.3    20    1      1  2018
    20  BIG HORN I            2.8    21    1      1  2018
    21  BIG HORN I            3.0    22    1      1  2018
    22  BIG HORN I            2.5    23    1      1  2018
    24  BIG HORN I            3.9     1    2      1  2018  #row with index of 23 removed
    25  BIG HORN I            4.3     2    2      1  2018
    26  BIG HORN I            3.5     3    2      1  2018