I have two data sets: The first data set contains participants' numerical answers to questions:
data <- data.frame(Q1 = 1:5,
Q2 = rev(1:5),
Q3 = c(4, 5, 1, 2, 3))
The second data set serves as a reference table where the solutions are stored:
ref.table <- data.frame(Question = c("Q1", "Q2", "Q3"),
Solution = c("big", "big", "small"))
I would like to compare the two data sets and create a new data set that contains the binary information on whether the answer was correct (1) or incorrect (0). For this, answers 1, 2, 3
correspond to "small"
, and answers 4, 5
correspond to "big"
.
My attempt is the following:
accuracy <- data.frame(lapply(data, function(x) {ifelse(x >= 4 & ref.table$Solution[ref.table$Question == colnames(data)[x]] == "big", 1, 0)}))
But somehow, this only gives me the incorrect answers as 0, while the correct answers are NA.
Does anyone know how to solve this? Thank you!
With tidyverse
, loop across
the columns, match
the column name (cur_column()
) with 'Question' column from 'ref.table', get the corresponding 'Solution' value, check if it is 'big' along with the value of the column >=
4 and coerce the logical to binary
library(dplyr)
data %>%
mutate(across(everything(), ~ +(.x >=4 &
ref.table$Solution[match(cur_column(), ref.table$Question)] ==
"big")))
-output
Q1 Q2 Q3
1 0 1 0
2 0 1 0
3 0 0 0
4 1 0 0
5 1 0 0
Or in base R
, loop over the column names in lapply
, extract the column with [[
, the logic applied with match
is the same as above
data[] <- lapply(names(data), \(nm) +(data[[nm]] >=4 &
ref.table$Solution[match(nm, ref.table$Question)] == "big"))