Search code examples
rdataframedata-cleaning

How to return logical vector over set of columns


I am searching for a series of values that could be present in any of 9 potential columns, and they all essentially represent the same thing, so I want to dichotomize those 9 columns into one for analysis.

    embolization = c("37200", "37211", "37213", "37214", "37236", "37237", "37238", "37239", "37241", "37242", "37243", "37244", "61624", "61626")
    embolization <- as.list(embolization)
    f = function(x) any(x == embolization, na.rm = FALSE)
apply(df2, MARGIN = 1, FUN = f)

when i run this function i get an error saying longer object length is not a multiple of shorter object. I would appreciate help or pointing me to the right direction.

here is a sample df.

     CPT1    CPT2    CPT3
1   49205   44015   38747
2   44015   38747   NULL
3   44015   38747   NULL
4   31624   NULL    NULL
5   NULL    NULL    NULL
6   43621   38747   44015
7   NULL    NULL    NULL

Say i want any of the values (38747, 30984, and 34445) to end up as a new column to be true. so i want the final df output to look like this

     CPT1    CPT2    CPT3  newcol
1   49205   44015   38747  TRUE
2   44015   38747   NULL   TRUE
3   44015   38747   NULL   TRUE
4   31624   NULL    NULL   FALSE
5   NULL    NULL    NULL   FALSE
6   43621   38747   44015  TRUE
7   NULL    NULL    NULL   FALSE

Solution

  • With apply you can use :

    embolization <- c(38747, 30984, 34445)
    df$newcol <- apply(df, 1, function(x) any(x %in% embolization))
    df
    
    #   CPT1  CPT2  CPT3 newcol
    #1 49205 44015 38747   TRUE
    #2 44015 38747  NULL   TRUE
    #3 44015 38747  NULL   TRUE
    #4 31624  NULL  NULL  FALSE
    #5  NULL  NULL  NULL  FALSE
    #6 43621 38747 44015   TRUE
    #7  NULL  NULL  NULL  FALSE
    

    Or with sapply and rowSums :

    df$newcol <- rowSums(sapply(df, `%in%`, embolization)) > 0