Search code examples
rranking

Analyze ranked-choice data


I'm currently doing research and collecting a ranked-choice data. Basically people choosing their preferences in a topic. E.g., people ranking their preference on fruits: orange, mango, apple, avocado

The clean data frame looks like this:

            Fruits                          Color
1   orange;apple;banana;avocado     blue;yellow;red;green
2   avocado;apple;banana;orange     red;green;blue;yellow
3   apple;banana;orange;avocado     yellow;red;green;blue
4   banana;orange;apple;avocado     green;blue;red;yellow
5   apple;avocado;banana;orange     yellow;blue;green;red

The first person put orange as their first preference, then apple, banana, and avocado as the last preference. and so on

Scoring: 1st preference = 4; 2nd preference = 3; 3rd preference = 2; 4th preference = 1

Desired result

        apple   avocado banana  orange  blue    green   red yellow
    1   3       1       2       4       4       1       2       3
    2   3       4       2       1       2       3       4       1
    3   4       1       3       2       1       2       3       4
    4   2       1       4       3       3       4       2       1
    5   4       3       2       1       3       2       1       4

The part that I confused is to figure out how to give score for each column -> turn from semicolon separated string into column with numeric value. If I can pass this, I can create the desired output dataframe.

I've found pmr package, but the documentation only a few. Moreover, that package is too advance. I don't really need that for current state, just need simple scores for each preferences

Please help me at the scoring stage


Solution

  • Here's an approach that works using a few lapply() and vapply() calls, but will generalize to more columns.

    library(tibble)
    d <- tibble::tribble(
      ~Fruits,                          ~Color,
         "orange;apple;banana;avocado",     "blue;yellow;red;green",
         "avocado;apple;banana;orange",     "red;green;blue;yellow",
         "apple;banana;orange;avocado",     "yellow;red;green;blue",
         "banana;orange;apple;avocado",     "green;blue;red;yellow",
         "apple;avocado;banana;orange",     "yellow;blue;green;red"
    )
    
    
    x <- lapply(unname(d), \(col) {
      l <- col %>% strsplit(";")
      x <- l[[1]] %>% unique() %>% sort()
      out <- lapply(x, \(x) {
        vapply(l, FUN.VALUE = numeric(1), \(vec) which(rev(vec) == x))
      })
      names(out) <- x
      as.data.frame(out)
    })
    
    do.call(cbind, x)
    
      apple avocado banana orange blue green red yellow
    1     3       1      2      4    4     1   2      3
    2     3       4      2      1    2     3   4      1
    3     4       1      3      2    1     2   3      4
    4     2       1      4      3    3     4   2      1
    5     4       3      2      1    3     2   1      4