Search code examples
rarulestraminer

Sequence of patterns in R sequence and events issues


I am trying to work with frequent sequences in R (SPADE). I have the following data set:

d1 <- c(1:10)
d2 <- c("nut", "bolt", "screw")
data <- data.frame(expand.grid(d1,d2))
data$status <- sample(c("a","b","c"), size = nrow(data), replace = TRUE)
colnames(data) <- c("day", "widget", "status")

   day widget status
1    1    nut      c
2    2    nut      b
3    3    nut      b
4    4    nut      b
5    5    nut      a
6    6    nut      a
7    7    nut      b
8    8    nut      c
9    9    nut      c
10  10    nut      b
11   1   bolt      a
12   2   bolt      b
...

I have not been able to get the data into a format that seems to work with the various packages available. I think the basic issue is that most packages would like to have sequences that are tied to an identity and an event. In my case that doesn't exists.

I want to answer the question of:

If on any day the status of widget[bolt] is an "a" and widget[screw] is a "c" and on the next day widget[screw] is "b" then on the 3rd day widget[nut] is likely to be "a".

So there is no identity or transaction/event to use. Am I over complicating this issue? Or is there a package that is well suited for this. So far I have tried arulesSequence and TraMineR.

Thank you


Solution

  • Not sure what you want to do. If you would like to use TraMineR, here is how you could input your data assuming the widgets are your sequence ids:

    library(TraMineR)
    
    ## Transforming into the STS form expected by seqdef()
    sts.data <- seqformat(data, from="SPELL", to="STS", id="widget", 
                          begin="day", end="day", status="status",
                          limit=10)
    
    ## Setting position names and sequence names
    names(sts.data) <- paste0("d",rep(1:10))
    rownames(sts.data) <- d2
    sts.data
    #       d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
    # nut    b  a  b  b  b  a  c  a  a   a
    # bolt   c  b  a  b  a  c  b  a  c   c
    # screw  a  b  a  a  c  c  b  b  b   c
    
    ## Creating the state sequence object
    sseq <- seqdef(sts.data)
    
    ## Potting the sequences
    seqiplot(sseq, ytlab="id", ncol=3)
    

    enter image description here