I'd like to count the number of times my students could state 5 specific words AND subset/filter for the students' responses in which the words were in the correct order. The correct order= green, yellow, orange, red, black. All the data is in lower case and has no punctuation:
# Student responses
Id Data$Colors
1 green yellow orange red black
2 yellow green orange red black
3 red violet pink black
4 purple green orange red black
5 blue pink yellow scarlet
The output I'm aiming for is:
# Student responses
Id Data$Colors Data$Count Data$CorrOrder
1 green yellow orange red black 5 TRUE
2 yellow green orange red blacks 4 FALSE
3 red violet pink black 2 TRUE
4 purple green orange red black 4 TRUE
5 blue pink yellow brown 1 NA
6 green yellow orange red very red black 4* TRUE
-1 point for repetition. I've been able to get the count column by doing this
Data <- c("\\bgreen\\b", "\\byellow\\b", "\\borange\\b", "\\bred\\b", "\\bblack\\b")
Data$Count<- str_count(Data$Colors, paste(Data, collapse = '|'))
However, this doesn't subtract for repeated correct colors like Id 6.
Anyone know how I could generate Data$CorrOrder
?
As a start, if you treat the values as an ordered
factor, you can check if they are sorted, without sorting, using is.unsorted
:
colorder <- c("green", "yellow", "orange", "red", "black")
spl <- lapply(strsplit(dat$Colors, "\\s+"), ordered, levels=colorder)
cnt <- sapply(spl, function(x) length(unique(na.omit(x))) - sum(tabulate(x) > 1) )
cnt
#[1] 5 4 2 4 1 4
out <- !sapply(spl, is.unsorted, na.rm=TRUE)
out[cnt == 1] <- NA
out
#[1] TRUE FALSE TRUE TRUE NA TRUE