Removing the whole IDs with partial missing values in the data frame by R codes

In the following sample data frame (image), I want to remove all "pid_old" variables for the same numbers if there is a missing value in other columns related to the same ID, even for one year. For example in the 8th line, the value for "wage" is missing. Therefore, I have to remove all "pid_old" which are "2". I will be thankful if anybody helps me how to write the code for this form of cleaning the data frame in R.

enter image description here

Solution

You can do this with tidyverse:

library(tidyverse)
a <- tibble(
  col1 = c("a", NA, "b", "a", "a", "a"),
  col2 = c(1,2,3, 4, 5, NA),
  pid_old = c(1,2,2,3,4,4))

`%notin%` <- Negate(`%in%`)

a %>% filter(
  pid_old %notin% (a %>% 
                     filter_all(any_vars(is.na(.))) %>% 
                     pull(pid_old))

Please post a reporducible example next time. You can do this with posting the output of dput(yourdata).

Explanation:

Extract a vector of pid_old values which contain any NA values.

a %>% filter_all(any_vars(is.na(.))) %>% pull(pid_old)

Filter out the pid_old values which are in the above vector.

a %>% filter( pid_old %notin% c())

This line:

`%notin%` <- Negate(`%in%`)

is credited to https://www.r-bloggers.com/the-notin-operator/