Search code examples
rdelete-row

R - delete rows according to the value of another row


I am quite a beginner in R but thanks to the community of Stackoverflow I am improving! However, I am stuck with a problem:

I have a dataset with 5 variables:

  • id_house represents the id for each household
  • id_ind is an id which values 1 for the first individual in the household, 2 for the next, 3 for the third...
  • Indicator_tb_men which indicates if the first person has answered to the survey (1 = yes, 0 = no). All the other members of the household take the value 0.
id_house    id_ind   indicator_tb_men
1             1       1
1             2       0
2             1       1
3             1       0
3             2       0
3             3       0
4             1       1
5             1       0

I would like to delete all members of households where the first individual has not answered the survey.

So it would give:

id_house    id_ind   indicator_tb_men
1             1       1
1             2       0
2             1       1
4             1       1

Solution

  • Using dplyr here is one way :

    library(dplyr)
    
    df %>%
      arrange(id_house, id_ind) %>%
      group_by(id_house) %>%
      filter(first(indicator_tb_men) != 0)
    
    #  id_house id_ind indicator_tb_men
    #     <int>  <int>            <int>
    #1        1      1                1
    #2        1      2               NA
    #3        2      1                1
    #4        4      1                1
    

    data

    df <- structure(list(id_house = c(1L, 1L, 2L, 3L, 3L, 3L, 4L, 5L), 
    id_ind = c(1L, 2L, 1L, 1L, 2L, 3L, 1L, 1L), indicator_tb_men = c(1L, 
    NA, 1L, 0L, NA, NA, 1L, 0L)), class = "data.frame", row.names = c(NA, -8L))