I need to create a column in a dataset that reports the most recent row-wise modal text value in a selection of columns (ignoring NAs).
Background: I've a dataset where up to 4 coders rated participant transcripts (one participant/row). Occasionally a minority of coders either disagree or select the wrong code for a participant/row. So I need to reproducibly select the modal code response across coders for each participant (i.e., for each row) and—when there is a tie—select the most recent (later) modal code responses (because later codings are more likely to be correct).
Here's a fake example of the dataset with four coder's codes (Essay or Chat) for 3 participants (one/row).
> fakeData = data.frame(id = 1:3,
+ Condition = c("Essay", "Chat", "Chat"),
+ FirstCoder = c("NA","Essay","Essay"),
+ SecondCoder = c("NA","Chat","Essay"),
+ ThirdCoder = c("Essay","Chat","Chat"),
+ FourthCoder = c("Essay","NA","Chat"))
> fakeData
id Condition FirstCoder SecondCoder ThirdCoder FourthCoder
1 1 Essay NA NA Essay Essay
2 2 Chat Essay Chat Chat NA
3 3 Chat Essay Essay Chat Chat
Regarding recency: The "FirstCoder" coded first, "SecondCoder" coded next, then the "ThirdCoder" submitted their code, and "FourthCoder" was the last (and most recent) coder to submit a response.
Here are some methods I've tried from other forums—notice how I need to ignore the "Condition" column:
> fakeData$ModalCode1 <- apply(fakeData,1,function(x) names(which.max(table(c("FirstCoder","SecondCoder", "ThirdCoder", "FourthCoder")))))
> fakeData$ModalCode2 <- apply(select(fakeData,ends_with("Coder")), 1, Mode)
The correct result would be this column (created manually)
> fakeData$MostRecentModalCode <- c("Essay", "Chat", "Chat")
You can see that none of my attempts are getting the correct result (i.e., "MostRecentModalCode").
> fakeData
id Condition FirstCoder SecondCoder ThirdCoder FourthCoder ModalCode1 ModalCode2 MostRecentModalCode
1 1 Essay NA NA Essay Essay FirstCoder NA Essay
2 2 Chat Essay Chat Chat NA FirstCoder Chat Chat
3 3 Chat Essay Essay Chat Chat FirstCoder Essay Chat
As you can see the final (correct) column ignores NAs and breaks modal ties with the more recent coders' responses (unlike the traditional Mode function).
Surely there's a function for this, but I am just failing to find or correctly implement it.
Advice and solutions welcome! (If I have to create a custom function, that's fine—albeit surprising.)
We can use the Mode
function from here
> Mode <- function(x) {
+ ux <- unique(x)
+ ux[which.max(tabulate(match(x, ux)))]
+ }
>
> apply(fakeData[-1], 1, Mode)
[1] "Essay" "Chat" "Chat"