Search code examples
pythonpandasdataframenoise

Deleting specific row (noise) after detecting specific character in Dataframe


This is the continuation of the previous problem that i've asked( Detecting specific character in Dataframe)

but let me explain it again

I have a dataframe using pandas (/python) that looks like this:

time_s wow lat_deg lon_deg
0 0.0 0.0 35.042628 -89.978249
1 2.0 0.0 35.042628 -89.978249
2 4.0 0.0 35.042628 -89.978249
3 6.0 0.0 35.042628 -89.978249
4 8.0 1 35.042628 -89.978249
5 10.0 0.0 35.042628 -89.978249
6 12.0 0.0 35.042628 -89.978249
7 14.0 0.0 35.042628 -89.978249
8 16.0 1 35.042628 -89.978249
9 18.0 1 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
11 22.0 0.0 35.042628 -89.978249
... ... ... ... ...

in the wow column, it is defined that it has the value of 0 and 1. Unfortunately, the data that i have, had some noise, that made the total entity (in row) more than it should be (it is actually from 500 data, but due to some noise, it is detected as 507 data)

Therefore, I intended to remove before processing that

The original data looked like this

(...,0,0,0,0,1,1,1,0,1,1,1,1,0,0,0,0,...)

I need to trim the data by deleting the "0" value in between 1 (1,0,1) so it will become

(...,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,...)

How do I can do that?


Solution

  • Assuming you only need to get rid out the 0 immediately enclosed by a pair of 1 (i.e. only 1 0 1 series considered), you can set a boolean mask and use .loc to filter the rows, as follows:

    m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
    df[~m]
    

    Demo

    As your sample data doesn't have this scenario, I have modified the data a bit:

    print(df)
    
    
        time_s  wow    lat_deg    lon_deg
    0      0.0  0.0  35.042628 -89.978249
    1      2.0  0.0  35.042628 -89.978249
    2      4.0  0.0  35.042628 -89.978249
    3      6.0  0.0  35.042628 -89.978249
    4      8.0  1.0  35.042628 -89.978249
    5     10.0  0.0  35.042628 -89.978249          <== Matching entry to get rid of 
    6     12.0  1.0  35.042628 -89.978249
    7     14.0  0.0  35.042628 -89.978249          <== Matching entry to get rid of 
    8     16.0  1.0  35.042628 -89.978249
    9     18.0  1.0  35.042628 -89.978249
    10    20.0  0.0  35.042628 -89.978249
    11    22.0  0.0  35.042628 -89.978249
    
    
    m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
    df[~m]
    
    
        time_s  wow    lat_deg    lon_deg
    0      0.0  0.0  35.042628 -89.978249
    1      2.0  0.0  35.042628 -89.978249
    2      4.0  0.0  35.042628 -89.978249
    3      6.0  0.0  35.042628 -89.978249
    4      8.0  1.0  35.042628 -89.978249
    6     12.0  1.0  35.042628 -89.978249
    8     16.0  1.0  35.042628 -89.978249
    9     18.0  1.0  35.042628 -89.978249
    10    20.0  0.0  35.042628 -89.978249
    11    22.0  0.0  35.042628 -89.978249