I am trying to work with frequent sequences in R (SPADE). I have the following data set:
d1 <- c(1:10)
d2 <- c("nut", "bolt", "screw")
data <- data.frame(expand.grid(d1,d2))
data$status <- sample(c("a","b","c"), size = nrow(data), replace = TRUE)
colnames(data) <- c("day", "widget", "status")
day widget status
1 1 nut c
2 2 nut b
3 3 nut b
4 4 nut b
5 5 nut a
6 6 nut a
7 7 nut b
8 8 nut c
9 9 nut c
10 10 nut b
11 1 bolt a
12 2 bolt b
...
I have not been able to get the data into a format that seems to work with the various packages available. I think the basic issue is that most packages would like to have sequences that are tied to an identity and an event. In my case that doesn't exists.
I want to answer the question of:
If on any day the status of widget[bolt] is an "a" and widget[screw] is a "c" and on the next day widget[screw] is "b" then on the 3rd day widget[nut] is likely to be "a".
So there is no identity or transaction/event to use. Am I over complicating this issue? Or is there a package that is well suited for this. So far I have tried arulesSequence and TraMineR.
Thank you
Not sure what you want to do. If you would like to use TraMineR
, here is how you could input your data assuming the widgets are your sequence ids:
library(TraMineR)
## Transforming into the STS form expected by seqdef()
sts.data <- seqformat(data, from="SPELL", to="STS", id="widget",
begin="day", end="day", status="status",
limit=10)
## Setting position names and sequence names
names(sts.data) <- paste0("d",rep(1:10))
rownames(sts.data) <- d2
sts.data
# d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
# nut b a b b b a c a a a
# bolt c b a b a c b a c c
# screw a b a a c c b b b c
## Creating the state sequence object
sseq <- seqdef(sts.data)
## Potting the sequences
seqiplot(sseq, ytlab="id", ncol=3)