Search code examples
traminer

SPS format of representative sequence from seqrep


Does anyone know how to extract the representative sequences form seqrep output, given in SPS format? It would be very helpful for viewers of the plot from seqrplot.


Solution

  • You can print the output with the format = 'SPS' argument. I illustrate with the biofam data

    data(biofam)
    biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
                    "Child", "Left+Child", "Left+Marr+Child", "Divorced")
    biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)
    
    ## Computing the distance matrix
    costs <- seqcost(biofam.seq, method="INDELSLOG")
    biofam.om <- seqdist(biofam.seq, method="OM", sm=costs$sm, indel=costs$indel)
    
    ## Representative set using the neighborhood density criterion
    biofam.rep <- seqrep(biofam.seq, diss=biofam.om, criterion="density")
    print(biofam.rep, format="SPS")
    
    ##     Sequence         
    ## [1] (0,16)           
    ## [2] (0,9)-(3,1)-(6,6)
    ## [3] (0,6)-(1,10)     
    

    And if you want to retrieve the representative sequences in SPS form, you can use seqformat and seqconc:

    biofam.rep.sps <- seqconc(seqformat(biofam.rep, to='SPS'))
    

    The result is a single column matrix with each representative sequence stored as a character string.