Search code examples
rstringcountstringrdetect

Detecting whether strings arise in a specific order


I'd like to count the number of times my students could state 5 specific words AND subset/filter for the students' responses in which the words were in the correct order. The correct order= green, yellow, orange, red, black. All the data is in lower case and has no punctuation:

#     Student responses
Id    Data$Colors
1     green yellow orange red black
2     yellow green orange red black
3     red violet pink black
4     purple green orange red black
5     blue pink yellow scarlet   

The output I'm aiming for is:

#   Student responses
Id  Data$Colors                                Data$Count   Data$CorrOrder
1   green yellow orange red black              5            TRUE
2   yellow green orange red blacks             4            FALSE
3   red violet pink black                      2            TRUE
4   purple green orange red black              4            TRUE
5   blue pink yellow brown                     1            NA
6   green yellow orange red very red black     4*           TRUE

-1 point for repetition. I've been able to get the count column by doing this

Data <- c("\\bgreen\\b", "\\byellow\\b", "\\borange\\b", "\\bred\\b", "\\bblack\\b")

Data$Count<- str_count(Data$Colors, paste(Data, collapse = '|'))

However, this doesn't subtract for repeated correct colors like Id 6.

Anyone know how I could generate Data$CorrOrder?


Solution

  • As a start, if you treat the values as an ordered factor, you can check if they are sorted, without sorting, using is.unsorted:

    colorder <- c("green", "yellow", "orange", "red", "black")
    
    spl <- lapply(strsplit(dat$Colors, "\\s+"), ordered, levels=colorder)
    cnt <- sapply(spl, function(x) length(unique(na.omit(x))) - sum(tabulate(x) > 1) )
    cnt
    #[1] 5 4 2 4 1 4
    out <- !sapply(spl, is.unsorted, na.rm=TRUE)
    out[cnt == 1] <- NA
    out
    #[1]  TRUE FALSE  TRUE  TRUE    NA  TRUE