Search code examples
rnadelete-row

Only keep levels that are incomplete, based on two columns


I could really use some help with the following:

I some sample data:

Group<-c("A","A","B","B","C","C","D", "D")
Value1<-c("7","1",8,7,"NA",9,10,12)
Value2<-c("NA","NA","NA",7,3,9,7,4)
df<-data.frame(Group, Value1, Value2)

  Group Value1 Value2
     A      7     NA
     A      1     NA
     B      8     NA
     B      7      7
     C     NA      3
     C      9      9
     D     10      7
     D     12      4

I want delete all groups that are either fully filled in for Value1 and Value2 (like Group D) or have no data for Value2 within a group (like Group A). So that I end up with the groups that are filled in for both Value1 and Value2 but are incomplete, like:

  Group Value1 Value2
     B      8     NA
     B      7      7
     C     NA      3
     C      9      9

I Know how to delete NA's, and I have tried some things with

setDT()[,  := if(any(Value2==)) "" else "" , by = .()]

but I don't really now how to use it in this case..

Does someone know how I could do this?


Solution

  • Something like:

    setDT(df)[, .SD[!all(is.na(Value2)) & (anyNA(Value1) | anyNA(Value2))], Group]
    

    Output:

       Group Value1 Value2
    1:     B      8     NA
    2:     B      7      7
    3:     C   <NA>      3
    4:     C      9      9
    

    Note that in order for this to work, your values should indeed be NA and not just "NA" as strings, i.e. it works with a modified dataframe example:

    Group<-c("A","A","B","B","C","C","D", "D")
    Value1<-c("7","1",8,7,NA,9,10,12)
    Value2<-c(NA,NA,NA,7,3,9,7,4)
    df<-data.frame(Group, Value1, Value2)