Search code examples
rdelete-row

Delete Variable only if ALWAYS 0 in Panel Data?


I have a panel data frame in which the number of children a woman has is asked. Now I would like to delete all women who DON´T have children, while maintaining women who f.e. didn´t have a child in 2016, but in 2018. Here´s part of the data frame for reference:

ID  year    child
1   2012    0
1   2014    0
1   2016    1
2   2012    0
2   2014    0
2   2016    0
3   2014    1
3   2016    1
4   2012    0
4   2016    1
4   2018    2
5   2018    0
5   2020    0

Can someone help me delete all women who are not mothers?


Solution

  • dplyr option:

    librar(dplyr)
    df %>%
      group_by(ID) %>%
      filter(sum(child) >= 1)
    

    Output:

    # A tibble: 8 × 3
    # Groups:   ID [3]
         ID  year child
      <dbl> <dbl> <dbl>
    1     1  2012     0
    2     1  2014     0
    3     1  2016     1
    4     3  2014     1
    5     3  2016     1
    6     4  2012     0
    7     4  2016     1
    8     4  2018     2
    

    As you can see mothers 2 and 5 do not have children.

    base R option:

    df[df$ID %in% df$ID[df$child!=0], ]
    

    Data

    df <- data.frame(ID = c(1,1,1,2,2,2,3,3,4,4,4,5,5),
                     year = c(2012, 2014, 2016, 2012, 2014, 2016, 2014, 2016, 2012, 2016, 2018, 2018, 2020),
                     child = c(0,0,1,0,0,0,1,1,0,1,2,0,0))