Search code examples
rsubset

Remove rows in df using multiple conditions in R


Is it possible to remove rows of data by referencing specific character strings or factor levels from 2 or more columns? For small datasets, this is easy because I can just scroll through the dataframe and remove the row I want, but how could this be achieved for larger datasets without endlessly scrolling to see which rows match my criteria?

Fake data:

df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
                  month = rep(c("March", "October"), each = 1), 
                  site = rep(c("1", "2", "3", "4", "5"), each = 2),
                  common_name = rep(c("Tuna", "shark"), each = 1),
                  num = sample(x = 0:2, size  = 20, replace = TRUE))

For example: How do I remove only site "1" in March of 2019 in one line of code and without looking at which row it's in?


Solution

  • You can use subset():

    df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
                      month = rep(c("March", "October"), each = 1), 
                      site = rep(c("1", "2", "3", "4", "5"), each = 2),
                      common_name = rep(c("Tuna", "shark"), each = 1),
                      num = sample(x = 0:2, size  = 20, replace = TRUE))
    
    subset(df1, !(site == "1" & year == 2019 & month == "March"))
    #>    year   month site common_name num
    #> 2  2019 October    1       shark   0
    #> 3  2019   March    2        Tuna   1
    #> 4  2019 October    2       shark   0
    #> 5  2019   March    3        Tuna   0
    #> 6  2019 October    3       shark   0
    #> 7  2019   March    4        Tuna   2
    #> 8  2019 October    4       shark   2
    #> 9  2019   March    5        Tuna   0
    #> 10 2019 October    5       shark   2
    #> 11 2020   March    1        Tuna   1
    #> 12 2020 October    1       shark   1
    #> 13 2020   March    2        Tuna   2
    #> 14 2020 October    2       shark   2
    #> 15 2020   March    3        Tuna   1
    #> 16 2020 October    3       shark   0
    #> 17 2020   March    4        Tuna   1
    #> 18 2020 October    4       shark   0
    #> 19 2020   March    5        Tuna   0
    #> 20 2020 October    5       shark   2
    

    Created on 2022-05-31 by the reprex package (v2.0.1)