I manage a film festival that holds an annual 'best film' competition. There can be up to 50 films and any number of judges each year.
I plan to use R to analyse the various judges' scores then publish a report in Markdown that I'll distribute to them.
Here is what the dataframe with example raw data will look like:
Film = c("Vertigo", "Fargo", "Platoon", "Beetlejuice", "Zodiac")
Mean = c( 7.45,6.98,7.5,7.2,5.82)
TrimmedMean = c(7.42, 7.04,7.36,7.05,6.26)
Judge1 = c(5.56,6.00,7.00,6.22,5.33)
Judge2 = c(6.89,8.81,8.83,7.22,6.78)
Judge3 = c(9.25,6.38,6.71,6.88,6.50)
Judge4 = c(8.00,7.67,7.44,6.89,7.22)
Judge5 = c(5.11,4.89,7.11,8.78,6.44)
Judge6 = c(9.89,8.11,7.89,7.22,2.67)
RawData = data.frame(Film,Mean,TrimmedMean,Judge1,Judge2,Judge3,Judge4,Judge5,Judge6)
The 'Average' and 'Trimmed Average' columns can be considered as additional 'judges' for the purpose of this exercise.
I want to rank the films for each of the judges by their scores and then remove the numerical values, leaving a sorted list of films that looks like this:
Mean = c('Platoon','Vertigo','Beetlejuice','Fargo','Zodiac')
TrimmedMean = c('Vertigo','Platoon','Beetlejuice','Fargo','Zodiac')
Judge1 = c('Platoon','Beetlejuice','Fargo','Vertigo','Zodiac')
Judge2 = c('Platoon','Fargo','Beetlejuice','Vertigo','Zodiac')
Judge3 = c('Vertigo','Beetlejuice','Platoon','Zodiac','Fargo')
Judge4 = c('Vertigo','Fargo','Platoon','Zodiac','Beetlejuice')
Judge5 = c('Beetlejuice','Platoon','Zodiac','Vertigo','Fargo')
Judge6 = c('Vertigo','Fargo','Platoon','Beetlejuice','Zodiac')
SortedData = data.frame(Mean,TrimmedMean,Judge1,Judge2,Judge3,Judge4,Judge5,Judge6)
The master dataset from which I produced the first table has data going back 5 years which I can subset by Year. I would like to be able to analyse data from different years without having to make changes in the code for each year other than specifying the year to subset. The code should be responsive to datasets from different years which will have different numbers of judges and films.
Any ideas?
Here is the R code and the CSV file for my analysis. I've changed the names of the judges to Judge 1, Judge 2, etc for privacy.
I think the function you are looking for is order()
. It shows the relative order of the values in a vector.
order(c(1.65, 6.92, 7.98, 3.42, 5.75), decreasing=TRUE)
# [1] 3 2 5 4 1
We can use this order, for each of the judges, to lookup the names in the Film
column.
data.frame(lapply(RawData[, -1], function(x) RawData$Film[order(x, decreasing=TRUE)]))
# Mean TrimmedMean Judge1 Judge2 Judge3 Judge4 Judge5 Judge6
# 1 Platoon Vertigo Platoon Platoon Vertigo Vertigo Beetlejuice Vertigo
# 2 Vertigo Platoon Beetlejuice Fargo Beetlejuice Fargo Platoon Fargo
# 3 Beetlejuice Beetlejuice Fargo Beetlejuice Platoon Platoon Zodiac Platoon
# 4 Fargo Fargo Vertigo Vertigo Zodiac Zodiac Vertigo Beetlejuice
# 5 Zodiac Zodiac Zodiac Zodiac Fargo Beetlejuice Fargo Zodiac
The same, but a bit more elegant (I think) and faster
SortedData <- RawData[, -1]
SortedData[] <- RawData$Film[sapply(RawData[, -1], order, decreasing=TRUE)]
SortedData
# Mean TrimmedMean Judge1 Judge2 Judge3 Judge4 Judge5 Judge6
# 1 Platoon Vertigo Platoon Platoon Vertigo Vertigo Beetlejuice Vertigo
# 2 Vertigo Platoon Beetlejuice Fargo Beetlejuice Fargo Platoon Fargo
# 3 Beetlejuice Beetlejuice Fargo Beetlejuice Platoon Platoon Zodiac Platoon
# 4 Fargo Fargo Vertigo Vertigo Zodiac Zodiac Vertigo Beetlejuice
# 5 Zodiac Zodiac Zodiac Zodiac Fargo Beetlejuice Fargo Zodiac
Using the linked data
dtf <- structure(list(Judge=rep(paste("Judge", 1:5), each=14),
Year=rep(2023, 70), Film=rep(c("And Then It Rained",
"Catching Climate", "Compact", "Crumble", "Cyanosis",
"Don't Leave Me", "Love Cut", "Lungs", "Mute", "Ragdoll",
"Room", "The Demon Within", "The Ghost Hunter",
"The Outside Dunny"), 5), Originality.Creativity=c(7, 6, 7, 7, 6,
6, 6, 7, 6, 7, 7, 5, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 7, 6, 7, 6,
6, 7, 5, 5, 5, 8, 3, 8, 7, 3, 3, 3, 5, 3, 7, 7, 8, 8, 9, 7, 9,
5, 9, 7, 10, 8, 9, 10, 9, 8, 7, 7, 7, 6, 8, 8, 6, 8, 6, 6, 7,
8, 8, 8), Direction=c(6, 5, 6, 6, 6, 6, 7, 6, 6, 7, 6, 6, 7, 5,
6, 7, 7, 7, 8, 7, 7, 7, 8, 7, 8, 7, 7, 8, 7, 5, 6, 8, 4, 8, 6,
3, 4, 5, 7, 4, 8, 6, 9, 8, 8, 9, 9, 4, 9, 7, 10, 8, 8, 9, 9, 9,
8, 6, 7, 8, 8, 9, 6, 7, 7, 7, 8, 8, 8, 7), Writing=c(6, 5, 6,
6, 6, 6, 6, 5, 6, 7, 6, 6, 7, 6, 7, 7, 7, 7, 7, 6, 7, 6, 8, 8,
8, 7, 7, 8, 7, 5, 8, 9, 4, 8, 7, 4, 4, 4, 8, 3, 8, 7, 8, 9, 9,
9, 8, 3, 8, 6, 9, 7, 8, 9, 9, 9, 8, 6, 5, 6, 8, 8, 6, 7, 6, 7,
8, 7, 8, 7), Cinematography=c(6, 5, 6, 6, 6, 6, 6, 5, 6, 8, 6,
6, 7, 5, 7, 7, 8, 7, 8, 7, 7, 6, 8, 8, 8, 7, 8, 8, 8, 5, 6, 9,
6, 8, 5, 3, 5, 6, 5, 5, 7, 5, 9, 8, 10, 10, 10, 4, 8, 7, 9, 10,
9, 9, 10, 9, 8, 6, 6, 7, 9, 8, 7, 7, 7, 8, 7, 8, 9, 6),
Performance=c(6, 5, 5, 6, 6, 6, 7, 5, 6, 8, 5, 6, 7, 5, 7, 7,
8, 7, 8, 7, 7, 7, 9, 8, 8, 7, 8, 8, 6, 5, 8, 9, 5, 8, 5, 2, 6,
6, 4, 4, 8, 5, 9, 9, 10, 10, 10, 4, 9, 10, 10, 10, 9, 10, 10,
9, 8, 6, 6, 7, 9, 8, 6, 8, 8, 7, 7, 7, 8, 6),
ProductionValue=c(6, 5, 5, 6, 6, 6, 6, 5, 6, 8, 5, 6, 7, 5, 7,
8, 7, 7, 8, 7, 6, 7, 8, 7, 8, 7, 8, 8, 7, 5, 6, 8, 5, 8, 6, 3,
4, 4, 6, 5, 8, 5, 8, 8, 8, 9, 9, 3, 8, 7, 10, 8, 8, 9, 9, 8, 8,
6, 6, 8, 9, 8, 7, 7, 7, 8, 7, 8, 7, 6), Pacing=c(6, 5, 5, 5, 6,
6, 6, 5, 6, 7, 5, 6, 7, 5, 7, 7, 7, 6, 8, 7, 7, 7, 8, 8, 8, 7,
8, 8, 7, 5, 6, 8, 5, 8, 6, 3, 5, 3, 6, 5, 8, 5, 7, 9, 7, 9, 8,
2, 9, 8, 9, 8, 9, 10, 9, 9, 7, 7, 6, 7, 8, 8, 7, 7, 6, 7, 8, 7,
6, 7), Structure=c(6, 5, 6, 5, 6, 6, 6, 5, 6, 6, 5, 6, 7, 5, 7,
7, 7, 7, 8, 7, 7, 6, 8, 7, 8, 7, 8, 8, 7, 5, 6, 9, 5, 8, 7, 3,
4, 4, 6, 6, 8, 4, 7, 9, 8, 8, 8, 1, 8, 8, 9, 7, 8, 9, 10, 9, 7,
7, 6, 7, 8, 8, 7, 7, 7, 7, 8, 7, 7, 7), Sound.Music=c(6, 5, 6,
6, 6, 6, 6, 6, 6, 7, 6, 7, 7, 5, 6, 5, 6, 8, 7, 7, 7, 6, 8, 6,
8, 8, 7, 7, 7, 5, 6, 9, 5, 8, 7, 3, 4, 6, 5, 7, 8, 5, 8, 8, 9,
9, 9, 1, 8, 8, 9, 8, 9, 9, 9, 8, 6, 6, 6, 8, 8, 7, 8, 7, 6, 9,
7, 8, 9, 7)), row.names=c(NA, -70L), class="data.frame")
# Order the data.frame, just in case
dtf <- dtf[with(dtf, order(Judge, Film, Year)),]
rownames(dtf) <- NULL
# Split the data.frame based on Judge
dtf.s <- split(dtf, ~ Judge)
# Check if all the films are the same
s <- sapply(dtf.s, function(x) x$Film)
stopifnot(all(s[, 1] == s[, -1]))
# Calculate average scores per judge
ascores <- sapply(dtf.s,
function(x) rowMeans(x[,-(1:3)])
)
ascores
# Judge 1 Judge 2 Judge 3 Judge 4 Judge 5
# 1 6.1111111 6.6666667 6.7777778 8.1111111 7.4444444
# 2 5.1111111 6.7777778 5.0000000 8.4444444 6.3333333
# 3 5.7777778 7.0000000 6.3333333 8.6666667 6.1111111
# 4 5.8888889 6.8888889 8.5555556 8.8888889 7.1111111
# 5 6.0000000 7.5555556 4.6666667 8.8888889 8.3333333
# 6 6.0000000 6.7777778 8.0000000 3.0000000 8.0000000
# 7 6.2222222 6.7777778 6.2222222 8.4444444 6.6666667
# 8 5.4444444 6.4444444 3.0000000 7.5555556 7.2222222
# 9 6.0000000 8.0000000 4.3333333 9.4444444 6.6666667
# 10 7.2222222 7.2222222 4.5555556 8.2222222 7.3333333
# 11 5.6666667 7.8888889 5.7777778 8.5555556 7.4444444
# 12 6.0000000 7.0000000 4.6666667 9.3333333 7.5555556
# 13 7.0000000 7.4444444 7.7777778 9.3333333 7.7777778
# 14 5.3333333 7.7777778 5.4444444 8.6666667 6.7777778
# Calculate average score across judges
avg.score <- rowMeans(ascores)
# Calculate olympic average score across judges
# Discard highest and lowest values
olympic <- function(x) {
(sum(x)-sum(range(x)))/(length(x)-2)
}
oavg.score <- apply(ascores, 1, olympic)
# Calculate winsorized average score across judges
# Clamp values to second highest and second lowest
winsor <- function(x) {
xtr <- x[-c(which.max(x), which.min(x))]
(sum(xtr)+sum(range(xtr)))/length(x)
}
wavg.score <- apply(ascores, 1, winsor)
# Combine scores into a matrix
scores <- cbind(mean=avg.score, olympic=oavg.score, winsor=wavg.score, ascores)
# Get film rankings, highest scoring at the top
ranks <- scores
ranks[] <- dtf.s[[c(1, 3)]][apply(scores, 2, order, decreasing=TRUE)]
ranks
# mean olympic winsor Judge 1
# 1 "The Ghost Hunter" "The Ghost Hunter" "The Ghost Hunter" "Ragdoll"
# 2 "Crumble" "Crumble" "Crumble" "The Ghost Hunter"
# 3 "Cyanosis" "Cyanosis" "Ragdoll" "Love Cut"
# 4 "Room" "Ragdoll" "Cyanosis" "And Then It Rained"
# 5 "And Then It Rained" "Room" "And Then It Rained" "Cyanosis"
# 6 "Ragdoll" "And Then It Rained" "Room" "Don't Leave Me"
# 7 "The Demon Within" "Don't Leave Me" "Don't Leave Me" "Mute"
# 8 "Mute" "Mute" "Mute" "The Demon Within"
# 9 "Love Cut" "The Demon Within" "The Demon Within" "Crumble"
# 10 "The Outside Dunny" "The Outside Dunny" "The Outside Dunny" "Compact"
# 11 "Compact" "Love Cut" "Love Cut" "Room"
# 12 "Don't Leave Me" "Compact" "Compact" "Lungs"
# 13 "Catching Climate" "Lungs" "Lungs" "The Outside Dunny"
# 14 "Lungs" "Catching Climate" "Catching Climate" "Catching Climate"
# Judge 2 Judge 3 Judge 4 Judge 5
# 1 "Mute" "Crumble" "Mute" "Cyanosis"
# 2 "Room" "Don't Leave Me" "The Demon Within" "Don't Leave Me"
# 3 "The Outside Dunny" "The Ghost Hunter" "The Ghost Hunter" "The Ghost Hunter"
# 4 "Cyanosis" "And Then It Rained" "Crumble" "The Demon Within"
# 5 "The Ghost Hunter" "Compact" "Cyanosis" "And Then It Rained"
# 6 "Ragdoll" "Love Cut" "Compact" "Room"
# 7 "Compact" "Room" "The Outside Dunny" "Ragdoll"
# 8 "The Demon Within" "The Outside Dunny" "Room" "Lungs"
# 9 "Crumble" "Catching Climate" "Catching Climate" "Crumble"
# 10 "Catching Climate" "Cyanosis" "Love Cut" "The Outside Dunny"
# 11 "Don't Leave Me" "The Demon Within" "Ragdoll" "Love Cut"
# 12 "Love Cut" "Ragdoll" "And Then It Rained" "Mute"
# 13 "And Then It Rained" "Mute" "Lungs" "Catching Climate"
# 14 "Lungs" "Lungs" "Don't Leave Me" "Compact"