Search code examples
rdplyrdatasetrowsubset

dplyr & subset: wrong way to remove rows


I'm trying to remove rows in my dataset according with specific values in 2 columns, but it seems that i'm setting it in the wrong way.

Here a sample of dataset

   nquest  nord   sex anasc  ireg   eta staciv studio    ID tpens
    <int> <int> <dbl> <int> <int> <int>  <int>  <int> <int> <int>
 1    173     1     1  1948    18    72      3      5     1  1800
 2   2886     1     1  1949    13    71      1      5     2  1211
 3   2886     2     0  1952    13    68      1      6     3  2100
 4   5416     1     0  1958     8    62      3      3     4   700
 5   7886     1     1  1950     9    70      1      5     5  2000
 6  20297     1     1  1960     5    60      1      3     6  1200
 7  20711     2     1  1944     4    76      1      2     7  2000
 8  22169     1     0  1944    15    76      4      2     8   600
 9  22276     1     1  1949     8    71      2      5     9  1200
10  22286     1     1  1950     8    70      1      2    10   850
11  22286     2     0  1956     8    64      1      2    11   650
12  22657     1     0  1951    13    69      1      7    12  2400
13  22657     2     1  1946    16    74      1      5    13  1500
14  23490     1     0  1937     5    83      2      5    14  1400
15  24147     1     1  1948     4    72      1      7    15  1730
16  24147     2     0  1958     4    62      1      5    16  1600
17  24853     1     1  1957    13    63      1      3    17  2180
18  27238     1     1  1952    12    68      1      3    19  1050
19  27730     1     1  1939    20    81      1      2    20  1470
20  27734     1     1  1947    20    73      1      2    21  1159

I want to get a dataset in which are exluded all the rows where the values of tpens are greater than 2000 if ireg = 13 ( I need to maintan all the other values of tpens and ireg if ireg is different than 13).

I have tried

new <- subset(data, data$ireg == 13 & data$tpens <= 2000)

But it is wrong, because even if tpens are now lower than 2000, it gives me a dataset with only ireg == 13. I need to maintain all the other values of ireg ( and then the tpens values linked to them) actually.

I also tried

new <-data [!(data$ireg == 13 & data$tpens <= 2000),]

But it is the same. Even using filter of dplyr, it seems I'm not able to set the conditions in the proper way

How can I remove the rows that satisfy specific conditions on 2 columns at the same time, but without delete all the other things?

I hope I was able to explain myself


Solution

  • subset or filter keeps the rows where the conditions are matched. So, you rather want the inverse selection:

    filter(data, !(ireg == 13 & tpens > 2000))