Search code examples
rtraminer

Seqfplot: percentage vs. number of most frequent sequences?


I'm using the R packages TraMineR to compute and analyze state sequences. I would like to obtain a sequence frequency plots using the command seqfplot. However, instead of setting the number of the most frequent sequences to be plotted using

seqfplot(mydata.seq, tlim=1:20)

it would be useful to set the percentage of the most frequent sequences needed to reach - for example - the 50% of the sample. I tried with this

seqfplot(mydata.seq, trep = 0.5)

but - differently from seqrep.grp and seqrep - the option trep is not supported by seqfplot command. Should I create a new function to do that?

Thank you.


Solution

  • You are right, the trep argument is an argument of TraMineR seqrep function which looks for representative sequences covering at least a trep percentage of all sequences.

    If you specifically want the most frequent sequence patterns such that their cumulated percent frequencies is say 50%, then you have to compute the selection filter your self. Here is how you can do that using the biofam data.

    library(TraMineR)
    data(biofam)
    bf.seq <- seqdef(biofam[,10:25])
    
    ## first retrieve the "Percent" column of the frequency table provided 
    ## as the  "freq" attribute of the object returned by the seqtab function.
    
    bf.freq <- seqtab(bf.seq, tlim=nrow(bf.seq))
    bf.tab <- attr(bf.freq,"freq")
    bf.perct <- bf.tab[,"Percent"]
    
    ## Compute the cumulated percentages
    bf.cumsum <- cumsum(bf.perct)
    
    ## Now we can use the cumulated percentage to select
    ## the wanted patterns
    bf.freq50 <- bf.freq[bf.cumsum <= 50,]
    
    ## And to plot the frequent patterns
    (nfreq <- length(bf.cumsum[bf.cumsum <= 50]))
    seqfplot(bf.seq, tlim=1:nfreq)
    

    Hope this helps.