Deleting/ rows if rows do not have atleast 2 recordings for multiple variable

Question 1) Let's say I have longitudinal data for 4 participants over 4 years i.e. years 0,1,3,4. My goal is to

check if the data has outcome variables (n1) recorded for at least any 2 timepoints.
If only one recording is present then delete; otherwise keep it.
repeat 1) & 2) for multiple outcome variables (m1)

data I have

ID  visit   n1  m1
1   0   5.6 0
1   1   1.5 NA
1   3   0.5 NA
1   4   NA  NA
2   0   6   1
2   1   NA  0
2   3   NA  0
2   4   NA  0
3   0   3.4 0
3   1   2.4 0
3   3   2.5 0
3   4   1   1
4   0   NA  NA
4   1   NA  NA
4   3   NA  NA
4   4   3.3 0

this is what I want

data 1       
ID  visit   n1
1   0   5.6
1   1   1.5
1   3   0.5
1   4   NA
3   0   3.4
3   1   2.4
3   3   2.5
3   4   1

data2        
ID  visit   m1
2   0   1
2   1   0
2   3   0
2   4   0
3   0   0
3   1   0
3   3   0
3   4   1

or this form where we create new variable n12 (0= <2 values present for n1 Vs. 1= >=2 values present for n1) & similarly m12. Later I can delete rows based on the values of these new variables n12 & m12.

ID  visit   n1  m1  n12 m12
1    0   5.6 0  1   0
1    1   1.5 NA 1   0
1    3   0.5 NA 1   0
1    4   NA  NA 1   0
2    0   6   1  0   1
2    1   NA  0  0   1
2    3   NA  0  0   1
2    4   NA  0  0   1
3    0   3.4 0  1   1
3    1   2.4 0  1   1
3    3   2.5 0  1   1
3    4   1   1  1   1
4    0   NA  NA 0   0
4    1   NA  NA 0   0
4    3   NA  NA 0   0
4    4   3.3 0  0   0

I tried Deleting incomplete cases across multiple rows in R studio but the following code gives me '0' observations in mydata as it will delete rows even if single NA is found across rows

mydata = mydata[!mydata$ID %in% mydata[!complete.cases(mydata) ,]$ID, ]

library(plyr)
# counts all the IDs
cnt = count(mydata, "ID")
# Eliminates any ID that doesn't have 2 observations
mydata[mydata$ID %in% cnt[cnt$freq == 2, ]$ID, ]

I also tried the long to wide format this did not work as I guess value in my case are multiple variables

library(dplyr)    
mydata <- mydata %>%
tidyr::spread(key=time, value=value) %>% # reformat to wide
na.omit() %>% # delete cases with missingness on any variable (i.e. any time point)
tidyr::gather(key="time", value="value", -ID) # put it back in long format

New Question 2): how should I code if I want to have only rows with visit =0 value of n1 and at least one other visit (1/3/4) recording for n1? to get data like this:

ID  visit   n1  
1   0   5.6 
1   1   1.5 
1   3   0.5 
1   4   NA  
3   0   3.4 
3   1   2.4 
3   3   2.5 
3   4   1

Kindly suggest R syntax or method to achieve goals Thanks!

Solution

dat <- structure(list(
  ID = c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4", "4", "4"),
  visit = c("0", "1", "3", "4", "0", "1", "3", "4", "0", "1", "3", "4", "0", "1", "3","4"),
  n1 = c("5.6", "1.5", "0.5", NA, "6", NA, NA, NA, "3.4","2.4", "2.5", "1", NA, NA, NA, "3.3"),
  m1 = c("0", NA, NA, NA, "1", "0", "0", "0", "0", "0", "0", "1", NA, NA, NA, "0")),
  row.names = 2:17, class = "data.frame")

library(dplyr)

dat %>%
  group_by(ID) %>%
  filter(sum(!is.na(n1)) >= 2) %>%
  assign("data1", ., inherits = TRUE)
data1
#> # A tibble: 8 x 4
#> # Groups:   ID [2]
#>   ID    visit n1    m1   
#>   <chr> <chr> <chr> <chr>
#> 1 1     0     5.6   0    
#> 2 1     1     1.5   <NA> 
#> 3 1     3     0.5   <NA> 
#> 4 1     4     <NA>  <NA> 
#> 5 3     0     3.4   0    
#> 6 3     1     2.4   0    
#> 7 3     3     2.5   0    
#> 8 3     4     1     1

dat %>%
  group_by(ID) %>%
  filter(sum(!is.na(m1)) >= 2) %>%
  assign("data2", ., inherits = TRUE)
data2
#> # A tibble: 8 x 4
#> # Groups:   ID [2]
#>   ID    visit n1    m1   
#>   <chr> <chr> <chr> <chr>
#> 1 2     0     6     1    
#> 2 2     1     <NA>  0    
#> 3 2     3     <NA>  0    
#> 4 2     4     <NA>  0    
#> 5 3     0     3.4   0    
#> 6 3     1     2.4   0    
#> 7 3     3     2.5   0    
#> 8 3     4     1     1

dat %>%
  group_by(ID) %>%
  mutate(n12 = ifelse(sum(!is.na(n1)) >= 2, 1, 0)) %>%
  mutate(m12 = ifelse(sum(!is.na(m1)) >= 2, 1, 0))
#> # A tibble: 16 x 6
#> # Groups:   ID [4]
#>    ID    visit n1    m1      n12   m12
#>    <chr> <chr> <chr> <chr> <dbl> <dbl>
#>  1 1     0     5.6   0         1     0
#>  2 1     1     1.5   <NA>      1     0
#>  3 1     3     0.5   <NA>      1     0
#>  4 1     4     <NA>  <NA>      1     0
#>  5 2     0     6     1         0     1
#>  6 2     1     <NA>  0         0     1
#>  7 2     3     <NA>  0         0     1
#>  8 2     4     <NA>  0         0     1
#>  9 3     0     3.4   0         1     1
#> 10 3     1     2.4   0         1     1
#> 11 3     3     2.5   0         1     1
#> 12 3     4     1     1         1     1
#> 13 4     0     <NA>  <NA>      0     0
#> 14 4     1     <NA>  <NA>      0     0
#> 15 4     3     <NA>  <NA>      0     0
#> 16 4     4     3.3   0         0     0
Created on 2021-12-16 by the reprex package (v2.0.1)

Demonstration of behavior:

# Let's look at one ID
# group_by() is essentially doing the same thing
# i.e., summarizing by group
id1 <- dat[dat$ID==1,]
id1
#>   ID visit   n1   m1
#> 2  1     0  5.6    0
#> 3  1     1  1.5 <NA>
#> 4  1     3  0.5 <NA>
#> 5  1     4 <NA> <NA>

# Test for NA in variable n1
is.na(id1$n1)
#> [1] FALSE FALSE FALSE  TRUE

# Identify values that aren't NA
!is.na(id1$n1)
#> [1]  TRUE  TRUE  TRUE FALSE

# Count values that aren't NA
sum(!is.na(id1$n1))
#> [1] 3

# For the first approach:
#
# Filter() keeps values that meet the criteria
# In this case, we are keeping IDs with more than 2 non-NAs in the focal variable (n1)
#
# Create object in the environment based on this dplyr chain
# named "data2";
# using data from dplyr chain ".";
# inherits=TRUE makes it available outside of the chain
assign(x="data1", value=., inherits=TRUE)

# For the second approach:
#
# If the number of non-NAs is greater than 2 (per your criteria),
# return a 1, otherwise 0, in a new column
# A result of 1 here indicates this ID has two or more values that are non-NA
ifelse(test=sum(!is.na(id1$n1)) >= 2, yes=1, no=0)
#> [1] 1