Search code examples
rtextfrequencymodedesctools

Most frequent factor across specific columns—with recency breaking ties


I need to create a column in a dataset that reports the most recent row-wise modal text value in a selection of columns (ignoring NAs).

Background: I've a dataset where up to 4 coders rated participant transcripts (one participant/row). Occasionally a minority of coders either disagree or select the wrong code for a participant/row. So I need to reproducibly select the modal code response across coders for each participant (i.e., for each row) and—when there is a tie—select the most recent (later) modal code responses (because later codings are more likely to be correct).

Here's a fake example of the dataset with four coder's codes (Essay or Chat) for 3 participants (one/row).

> fakeData = data.frame(id = 1:3,
+                 Condition = c("Essay", "Chat", "Chat"),
+                 FirstCoder = c("NA","Essay","Essay"),
+                 SecondCoder = c("NA","Chat","Essay"),
+                 ThirdCoder = c("Essay","Chat","Chat"),
+                 FourthCoder = c("Essay","NA","Chat"))
> fakeData
  id Condition FirstCoder SecondCoder ThirdCoder FourthCoder
1  1     Essay         NA          NA      Essay       Essay
2  2      Chat      Essay        Chat       Chat          NA
3  3      Chat      Essay       Essay       Chat        Chat

Regarding recency: The "FirstCoder" coded first, "SecondCoder" coded next, then the "ThirdCoder" submitted their code, and "FourthCoder" was the last (and most recent) coder to submit a response.

Here are some methods I've tried from other forums—notice how I need to ignore the "Condition" column:

> fakeData$ModalCode1 <- apply(fakeData,1,function(x) names(which.max(table(c("FirstCoder","SecondCoder", "ThirdCoder", "FourthCoder")))))
> fakeData$ModalCode2 <- apply(select(fakeData,ends_with("Coder")), 1, Mode)

The correct result would be this column (created manually)

> fakeData$MostRecentModalCode <- c("Essay", "Chat", "Chat")

You can see that none of my attempts are getting the correct result (i.e., "MostRecentModalCode").

> fakeData
  id Condition FirstCoder SecondCoder ThirdCoder FourthCoder ModalCode1 ModalCode2 MostRecentModalCode
1  1     Essay         NA          NA      Essay       Essay FirstCoder         NA               Essay
2  2      Chat      Essay        Chat       Chat          NA FirstCoder       Chat                Chat
3  3      Chat      Essay       Essay       Chat        Chat FirstCoder      Essay                Chat

As you can see the final (correct) column ignores NAs and breaks modal ties with the more recent coders' responses (unlike the traditional Mode function).

Surely there's a function for this, but I am just failing to find or correctly implement it.

Advice and solutions welcome! (If I have to create a custom function, that's fine—albeit surprising.)


Solution

  • We can use the Mode function from here

    > Mode <- function(x) {
    +   ux <- unique(x)
    +   ux[which.max(tabulate(match(x, ux)))]
    + }
    > 
    > apply(fakeData[-1], 1, Mode)
    [1] "Essay" "Chat"  "Chat"