Search code examples
traminersubsequence

Specifying need for certain cohorts to calculate most frequent subsequences for generations


I have a categorial variable "generations". I want to calculate the most frequent subsequences for each generation using TraMineR, but I cannot understand how to specify that I need a certain cohort. I've tried every possible solution I know, but nothing has worked thus far. This is the code I cannot specify:

GGS.seqe <- seqecreate(GGS.seq, tevent = "state")
fsubseq <- seqefsub(GGS.seqe, pMinSupport=0.01)
fsubseq[1:50]

Solution

  • Assuming you have a factor cohort, here is how you would get a list with the set the most frequent subsequences by cohort:

    ncohort <- length(levels(cohort))  # number of cohorts
    mostfreq <- vector("list",ncohort) # list of length ncohort
    
    GGS.seqe <- seqecreate(GGS.seq, tevent = "state")
    for (i in 1:ncohort) {
      mostfreq[i] <- seqefsub(GGS.seqe[cohort==levels(cohort)[i]], pMinSupport=0.01)
    }
    

    You then access each element of the list with mostfreq[i], e.g., for the second cohort mostfreq[2].