Search code examples
rlapplyr-colnames

How to use current column name in lapply function in R?


I have two data sets: The first data set contains participants' numerical answers to questions:

data <- data.frame(Q1 = 1:5,
                   Q2 = rev(1:5),
                   Q3 = c(4, 5, 1, 2, 3))

The second data set serves as a reference table where the solutions are stored:

ref.table <- data.frame(Question = c("Q1", "Q2", "Q3"),
                        Solution = c("big", "big", "small"))

I would like to compare the two data sets and create a new data set that contains the binary information on whether the answer was correct (1) or incorrect (0). For this, answers 1, 2, 3 correspond to "small", and answers 4, 5 correspond to "big".

My attempt is the following:

accuracy <- data.frame(lapply(data, function(x) {ifelse(x >= 4 & ref.table$Solution[ref.table$Question == colnames(data)[x]] == "big", 1, 0)}))

But somehow, this only gives me the incorrect answers as 0, while the correct answers are NA.

Does anyone know how to solve this? Thank you!


Solution

  • With tidyverse, loop across the columns, match the column name (cur_column()) with 'Question' column from 'ref.table', get the corresponding 'Solution' value, check if it is 'big' along with the value of the column >= 4 and coerce the logical to binary

    library(dplyr)
    data %>%
       mutate(across(everything(), ~ +(.x >=4 & 
        ref.table$Solution[match(cur_column(), ref.table$Question)] == 
            "big")))
    

    -output

      Q1 Q2 Q3
    1  0  1  0
    2  0  1  0
    3  0  0  0
    4  1  0  0
    5  1  0  0
    

    Or in base R, loop over the column names in lapply, extract the column with [[, the logic applied with match is the same as above

    data[] <- lapply(names(data), \(nm) +(data[[nm]] >=4 & 
        ref.table$Solution[match(nm, ref.table$Question)] == "big"))