I've got a data set from two raters judging a set of videoclips on multiple (binary) criteria. I'd like to plot a confusion matrix to better understand their agreement/disagreement. But all the examples I've found so far are for cases where each judge only rates on one criteria per clip. In my case, judges rate every criteria for each clip.
Say I have 4 binary criteria (A_Con..A_Mod), judged by two raters (A and B), for a set of videoclips (in this case 80):
str (mydata)
'data.frame': 160 obs. of 6 variables:
$ A_Con: int 0 0 0 0 0 0 0 0 0 0 ...
$ A_Dom: int 0 0 0 1 0 0 0 0 0 0 ...
$ A_Met: int 0 0 0 0 0 0 1 0 0 1 ...
$ A_Mod: int 0 0 0 1 0 1 0 0 0 1 ...
$ Rater: Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 2 2 2 ...
$ Clip : int 1 2 3 4 5 6 7 8 9 10 ...
I can melt this into:
> str(mymolten)
'data.frame': 640 obs. of 4 variables:
$ Rater : Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 2 2 2 ...
$ Clip : int 1 2 3 4 5 6 7 8 9 10 ...
$ variable: Factor w/ 4 levels "A_Con","A_Dom",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : int 0 0 0 0 0 0 0 0 0 0 ...
But I can't figure out how to cast it into a confusion matrix that would count the combinations (which are not nearly so perfect as this):
Rater B
A_Con A_Dom A_Met A_Mod
A_Con 19 1 0 0
Rater A A_Dom 1 20 0 0
A_Met 0 0 20 5
A_Mod 0 2 0 20
It seems like the table() function is the way to go, but how to format the data?
This may not be the simplest solution.
You can separate the data for the two raters,
and merge
the resulting data.frames.
# Sample data
n <- 80
d0 <- data.frame(
A_Con = round(runif(2*n)),
A_Dom = round(runif(2*n)),
A_Met = round(runif(2*n)),
A_Mod = round(runif(2*n)),
Rater = rep(c("A","B"), n),
Clip = rep(1:n,each=2)
)
library(reshape2)
library(plyr)
d <- melt(d0, id.vars=c("Rater","Clip"))
d <- d[ d$value==1, ]
A <- d[d$Rater=="A",]
B <- d[d$Rater=="B",]
A <- data.frame( Clip=A$Clip, A=A$variable )
B <- data.frame( Clip=B$Clip, B=B$variable )
d <- merge(A, B, all=FALSE)
d <- ddply( d, c("A", "B"), summarize, n=length(Clip) )
dcast( d, A ~ B )