Search code examples
rdataframerankranking

Ranking a list of terms by the values in several columns


I manage a film festival that holds an annual 'best film' competition. There can be up to 50 films and any number of judges each year.

I plan to use R to analyse the various judges' scores then publish a report in Markdown that I'll distribute to them.

Here is what the dataframe with example raw data will look like:

Film = c("Vertigo", "Fargo", "Platoon", "Beetlejuice", "Zodiac")
Mean = c( 7.45,6.98,7.5,7.2,5.82)
TrimmedMean = c(7.42, 7.04,7.36,7.05,6.26)
Judge1 = c(5.56,6.00,7.00,6.22,5.33)
Judge2 = c(6.89,8.81,8.83,7.22,6.78)
Judge3 = c(9.25,6.38,6.71,6.88,6.50)
Judge4 = c(8.00,7.67,7.44,6.89,7.22)
Judge5 = c(5.11,4.89,7.11,8.78,6.44)
Judge6 = c(9.89,8.11,7.89,7.22,2.67)
RawData = data.frame(Film,Mean,TrimmedMean,Judge1,Judge2,Judge3,Judge4,Judge5,Judge6)

The 'Average' and 'Trimmed Average' columns can be considered as additional 'judges' for the purpose of this exercise.

I want to rank the films for each of the judges by their scores and then remove the numerical values, leaving a sorted list of films that looks like this:

Mean = c('Platoon','Vertigo','Beetlejuice','Fargo','Zodiac')
TrimmedMean = c('Vertigo','Platoon','Beetlejuice','Fargo','Zodiac')
Judge1 = c('Platoon','Beetlejuice','Fargo','Vertigo','Zodiac')
Judge2 = c('Platoon','Fargo','Beetlejuice','Vertigo','Zodiac')
Judge3 = c('Vertigo','Beetlejuice','Platoon','Zodiac','Fargo')
Judge4 = c('Vertigo','Fargo','Platoon','Zodiac','Beetlejuice')
Judge5 = c('Beetlejuice','Platoon','Zodiac','Vertigo','Fargo')
Judge6 = c('Vertigo','Fargo','Platoon','Beetlejuice','Zodiac')
SortedData = data.frame(Mean,TrimmedMean,Judge1,Judge2,Judge3,Judge4,Judge5,Judge6)

The master dataset from which I produced the first table has data going back 5 years which I can subset by Year. I would like to be able to analyse data from different years without having to make changes in the code for each year other than specifying the year to subset. The code should be responsive to datasets from different years which will have different numbers of judges and films.

Any ideas?


Here is the R code and the CSV file for my analysis. I've changed the names of the judges to Judge 1, Judge 2, etc for privacy.


Solution

  • I think the function you are looking for is order(). It shows the relative order of the values in a vector.

    order(c(1.65, 6.92, 7.98, 3.42, 5.75), decreasing=TRUE)
    # [1] 3 2 5 4 1
    

    We can use this order, for each of the judges, to lookup the names in the Film column.

    data.frame(lapply(RawData[, -1], function(x) RawData$Film[order(x, decreasing=TRUE)]))
    #          Mean TrimmedMean      Judge1      Judge2      Judge3      Judge4      Judge5      Judge6
    # 1     Platoon     Vertigo     Platoon     Platoon     Vertigo     Vertigo Beetlejuice     Vertigo
    # 2     Vertigo     Platoon Beetlejuice       Fargo Beetlejuice       Fargo     Platoon       Fargo
    # 3 Beetlejuice Beetlejuice       Fargo Beetlejuice     Platoon     Platoon      Zodiac     Platoon
    # 4       Fargo       Fargo     Vertigo     Vertigo      Zodiac      Zodiac     Vertigo Beetlejuice
    # 5      Zodiac      Zodiac      Zodiac      Zodiac       Fargo Beetlejuice       Fargo      Zodiac
    

    The same, but a bit more elegant (I think) and faster

    SortedData <- RawData[, -1]
    SortedData[] <- RawData$Film[sapply(RawData[, -1], order, decreasing=TRUE)]
    
    SortedData
    #          Mean TrimmedMean      Judge1      Judge2      Judge3      Judge4      Judge5      Judge6
    # 1     Platoon     Vertigo     Platoon     Platoon     Vertigo     Vertigo Beetlejuice     Vertigo
    # 2     Vertigo     Platoon Beetlejuice       Fargo Beetlejuice       Fargo     Platoon       Fargo
    # 3 Beetlejuice Beetlejuice       Fargo Beetlejuice     Platoon     Platoon      Zodiac     Platoon
    # 4       Fargo       Fargo     Vertigo     Vertigo      Zodiac      Zodiac     Vertigo Beetlejuice
    # 5      Zodiac      Zodiac      Zodiac      Zodiac       Fargo Beetlejuice       Fargo      Zodiac
    

    Using the linked data

    dtf <- structure(list(Judge=rep(paste("Judge", 1:5), each=14),
      Year=rep(2023, 70), Film=rep(c("And Then It Rained",
      "Catching Climate", "Compact", "Crumble", "Cyanosis",
      "Don't Leave Me", "Love Cut", "Lungs", "Mute", "Ragdoll",
      "Room", "The Demon Within", "The Ghost Hunter",
      "The Outside Dunny"), 5), Originality.Creativity=c(7, 6, 7, 7, 6,
      6, 6, 7, 6, 7, 7, 5, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 7, 6,
      6, 7, 5, 5, 5, 8, 3, 8, 7, 3, 3, 3, 5, 3, 7, 7, 8, 8, 9, 7, 9,
      5, 9, 7, 10, 8, 9, 10, 9, 8, 7, 7, 7, 6, 8, 8, 6, 8, 6, 6, 7,
      8, 8, 8), Direction=c(6, 5, 6, 6, 6, 6, 7, 6, 6, 7, 6, 6, 7, 5,
      6, 7, 7, 7, 8, 7, 7, 7, 8, 7, 8, 7, 7, 8, 7, 5, 6, 8, 4, 8, 6,
      3, 4, 5, 7, 4, 8, 6, 9, 8, 8, 9, 9, 4, 9, 7, 10, 8, 8, 9, 9, 9,
      8, 6, 7, 8, 8, 9, 6, 7, 7, 7, 8, 8, 8, 7), Writing=c(6, 5, 6,
      6, 6, 6, 6, 5, 6, 7, 6, 6, 7, 6, 7, 7, 7, 7, 7, 6, 7, 6, 8, 8,
      8, 7, 7, 8, 7, 5, 8, 9, 4, 8, 7, 4, 4, 4, 8, 3, 8, 7, 8, 9, 9,
      9, 8, 3, 8, 6, 9, 7, 8, 9, 9, 9, 8, 6, 5, 6, 8, 8, 6, 7, 6, 7,
      8, 7, 8, 7), Cinematography=c(6, 5, 6, 6, 6, 6, 6, 5, 6, 8, 6,
      6, 7, 5, 7, 7, 8, 7, 8, 7, 7, 6, 8, 8, 8, 7, 8, 8, 8, 5, 6, 9,
      6, 8, 5, 3, 5, 6, 5, 5, 7, 5, 9, 8, 10, 10, 10, 4, 8, 7, 9, 10,
      9, 9, 10, 9, 8, 6, 6, 7, 9, 8, 7, 7, 7, 8, 7, 8, 9, 6),
      Performance=c(6, 5, 5, 6, 6, 6, 7, 5, 6, 8, 5, 6, 7, 5, 7, 7,
      8, 7, 8, 7, 7, 7, 9, 8, 8, 7, 8, 8, 6, 5, 8, 9, 5, 8, 5, 2, 6,
      6, 4, 4, 8, 5, 9, 9, 10, 10, 10, 4, 9, 10, 10, 10, 9, 10, 10,
      9, 8, 6, 6, 7, 9, 8, 6, 8, 8, 7, 7, 7, 8, 6),
      ProductionValue=c(6, 5, 5, 6, 6, 6, 6, 5, 6, 8, 5, 6, 7, 5, 7,
      8, 7, 7, 8, 7, 6, 7, 8, 7, 8, 7, 8, 8, 7, 5, 6, 8, 5, 8, 6, 3,
      4, 4, 6, 5, 8, 5, 8, 8, 8, 9, 9, 3, 8, 7, 10, 8, 8, 9, 9, 8, 8,
      6, 6, 8, 9, 8, 7, 7, 7, 8, 7, 8, 7, 6), Pacing=c(6, 5, 5, 5, 6,
      6, 6, 5, 6, 7, 5, 6, 7, 5, 7, 7, 7, 6, 8, 7, 7, 7, 8, 8, 8, 7,
      8, 8, 7, 5, 6, 8, 5, 8, 6, 3, 5, 3, 6, 5, 8, 5, 7, 9, 7, 9, 8,
      2, 9, 8, 9, 8, 9, 10, 9, 9, 7, 7, 6, 7, 8, 8, 7, 7, 6, 7, 8, 7,
      6, 7), Structure=c(6, 5, 6, 5, 6, 6, 6, 5, 6, 6, 5, 6, 7, 5, 7,
      7, 7, 7, 8, 7, 7, 6, 8, 7, 8, 7, 8, 8, 7, 5, 6, 9, 5, 8, 7, 3,
      4, 4, 6, 6, 8, 4, 7, 9, 8, 8, 8, 1, 8, 8, 9, 7, 8, 9, 10, 9, 7,
      7, 6, 7, 8, 8, 7, 7, 7, 7, 8, 7, 7, 7), Sound.Music=c(6, 5, 6,
      6, 6, 6, 6, 6, 6, 7, 6, 7, 7, 5, 6, 5, 6, 8, 7, 7, 7, 6, 8, 6,
      8, 8, 7, 7, 7, 5, 6, 9, 5, 8, 7, 3, 4, 6, 5, 7, 8, 5, 8, 8, 9,
      9, 9, 1, 8, 8, 9, 8, 9, 9, 9, 8, 6, 6, 6, 8, 8, 7, 8, 7, 6, 9,
      7, 8, 9, 7)), row.names=c(NA, -70L), class="data.frame")
    
    
    # Order the data.frame, just in case
    dtf <- dtf[with(dtf, order(Judge, Film, Year)),]
    rownames(dtf) <- NULL
    
    # Split the data.frame based on Judge
    dtf.s <- split(dtf, ~ Judge)
    
    # Check if all the films are the same
    s <- sapply(dtf.s, function(x) x$Film)
    stopifnot(all(s[, 1] == s[, -1]))
    
    # Calculate average scores per judge
    ascores <- sapply(dtf.s, 
      function(x) rowMeans(x[,-(1:3)])
    )
    ascores
    #      Judge 1   Judge 2   Judge 3   Judge 4   Judge 5
    # 1  6.1111111 6.6666667 6.7777778 8.1111111 7.4444444
    # 2  5.1111111 6.7777778 5.0000000 8.4444444 6.3333333
    # 3  5.7777778 7.0000000 6.3333333 8.6666667 6.1111111
    # 4  5.8888889 6.8888889 8.5555556 8.8888889 7.1111111
    # 5  6.0000000 7.5555556 4.6666667 8.8888889 8.3333333
    # 6  6.0000000 6.7777778 8.0000000 3.0000000 8.0000000
    # 7  6.2222222 6.7777778 6.2222222 8.4444444 6.6666667
    # 8  5.4444444 6.4444444 3.0000000 7.5555556 7.2222222
    # 9  6.0000000 8.0000000 4.3333333 9.4444444 6.6666667
    # 10 7.2222222 7.2222222 4.5555556 8.2222222 7.3333333
    # 11 5.6666667 7.8888889 5.7777778 8.5555556 7.4444444
    # 12 6.0000000 7.0000000 4.6666667 9.3333333 7.5555556
    # 13 7.0000000 7.4444444 7.7777778 9.3333333 7.7777778
    # 14 5.3333333 7.7777778 5.4444444 8.6666667 6.7777778
    
    # Calculate average score across judges
    avg.score <- rowMeans(ascores)
    
    # Calculate olympic average score across judges
    # Discard highest and lowest values
    olympic <- function(x) {
        (sum(x)-sum(range(x)))/(length(x)-2)
    }
    
    oavg.score <- apply(ascores, 1, olympic)
    
    # Calculate winsorized average score across judges
    # Clamp values to second highest and second lowest
    winsor <- function(x) {
        xtr <- x[-c(which.max(x), which.min(x))]
        (sum(xtr)+sum(range(xtr)))/length(x)
    }
    
    wavg.score <- apply(ascores, 1, winsor)
    
    # Combine scores into a matrix
    scores <- cbind(mean=avg.score, olympic=oavg.score, winsor=wavg.score, ascores)
    
    # Get film rankings, highest scoring at the top
    ranks <- scores
    ranks[] <- dtf.s[[c(1, 3)]][apply(scores, 2, order, decreasing=TRUE)]
    
    ranks
    #    mean                 olympic              winsor               Judge 1             
    # 1  "The Ghost Hunter"   "The Ghost Hunter"   "The Ghost Hunter"   "Ragdoll"           
    # 2  "Crumble"            "Crumble"            "Crumble"            "The Ghost Hunter"  
    # 3  "Cyanosis"           "Cyanosis"           "Ragdoll"            "Love Cut"          
    # 4  "Room"               "Ragdoll"            "Cyanosis"           "And Then It Rained"
    # 5  "And Then It Rained" "Room"               "And Then It Rained" "Cyanosis"          
    # 6  "Ragdoll"            "And Then It Rained" "Room"               "Don't Leave Me"    
    # 7  "The Demon Within"   "Don't Leave Me"     "Don't Leave Me"     "Mute"              
    # 8  "Mute"               "Mute"               "Mute"               "The Demon Within"  
    # 9  "Love Cut"           "The Demon Within"   "The Demon Within"   "Crumble"           
    # 10 "The Outside Dunny"  "The Outside Dunny"  "The Outside Dunny"  "Compact"           
    # 11 "Compact"            "Love Cut"           "Love Cut"           "Room"              
    # 12 "Don't Leave Me"     "Compact"            "Compact"            "Lungs"             
    # 13 "Catching Climate"   "Lungs"              "Lungs"              "The Outside Dunny" 
    # 14 "Lungs"              "Catching Climate"   "Catching Climate"   "Catching Climate"  
    #    Judge 2              Judge 3              Judge 4              Judge 5             
    # 1  "Mute"               "Crumble"            "Mute"               "Cyanosis"          
    # 2  "Room"               "Don't Leave Me"     "The Demon Within"   "Don't Leave Me"    
    # 3  "The Outside Dunny"  "The Ghost Hunter"   "The Ghost Hunter"   "The Ghost Hunter"  
    # 4  "Cyanosis"           "And Then It Rained" "Crumble"            "The Demon Within"  
    # 5  "The Ghost Hunter"   "Compact"            "Cyanosis"           "And Then It Rained"
    # 6  "Ragdoll"            "Love Cut"           "Compact"            "Room"              
    # 7  "Compact"            "Room"               "The Outside Dunny"  "Ragdoll"           
    # 8  "The Demon Within"   "The Outside Dunny"  "Room"               "Lungs"             
    # 9  "Crumble"            "Catching Climate"   "Catching Climate"   "Crumble"           
    # 10 "Catching Climate"   "Cyanosis"           "Love Cut"           "The Outside Dunny" 
    # 11 "Don't Leave Me"     "The Demon Within"   "Ragdoll"            "Love Cut"          
    # 12 "Love Cut"           "Ragdoll"            "And Then It Rained" "Mute"              
    # 13 "And Then It Rained" "Mute"               "Lungs"              "Catching Climate"  
    # 14 "Lungs"              "Lungs"              "Don't Leave Me"     "Compact"