Search code examples
rdplyracross

apply function across columns using column names within function


I am trying to iterate over 100 columns to identify whether the a variable in a separate column matches the column name. I thought maybe the across function might be able to but can't figure out how to use the mutate on each column. See example below.

tst=structure(list(type = c("DOG", "DOG", "DOG", "CAT", "CAT", "CAT", 
"MOUSE", "MOUSE", "MOUSE"), CAT = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), DOG = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), MOUSE = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), id = 1:9), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))

My table currently has the following structure.

   type  CAT   DOG   MOUSE    id
  <chr> <chr> <chr> <chr> <int>
1 DOG   NA    NA    NA        1
2 DOG   NA    NA    NA        2
3 DOG   NA    NA    NA        3
4 CAT   NA    NA    NA        4
5 CAT   NA    NA    NA        5
6 CAT   NA    NA    NA        6
7 MOUSE NA    NA    NA        7
8 MOUSE NA    NA    NA        8
9 MOUSE NA    NA    NA        9

I would like the end result to look like this:

   type  CAT   DOG   MOUSE    id
  <chr> <chr> <chr> <chr> <int>
1 DOG   NA    TRUE  NA        1
2 DOG   NA    TRUE  NA        2
3 DOG   NA    TRUE  NA        3
4 CAT   TRUE  NA    NA        4
5 CAT   TRUE  NA    NA        5
6 CAT   TRUE  NA    NA        6
7 MOUSE NA    NA    TRUE      7
8 MOUSE NA    NA    TRUE      8
9 MOUSE NA    NA    TRUE      9 

This works but it is not sufficient for 100 columns.

tst<-tst%>%mutate(CAT=ifelse(type==names(tst[2]),'TRUE',NA))
tst<-tst%>%mutate(DOG=ifelse(type==names(tst[3]),'TRUE',NA))
tst<-tst%>%mutate(MOUSE=ifelse(type==names(tst[4]),'TRUE',NA))

Solution

  • A candidate solution is the following (Without dplyr)

    # initialise list
    tmpList <- list()
    
    # iterate over each row
    for (i in 1:nrow(tst)) {
      
      tmpList[[i]] <- colnames(tst[-c(1,5)]) %in% tst$type[i]
      
    }
    
    # save as data frame
    output <- as.data.frame(do.call(rbind, tmpList))
    colnames(output) <- colnames(tst[-c(1,5)]) 
    
    # cbind with data
    output <- cbind(tst[,c(1,5)],output)
    

    Which gives what you are looking for! If there is a better a solution, it is not something that readily comes to my mind.

    Best!