Search code examples
rtraminer

Plotting and sorting of multi-channel sequence objects


I would like to make a sequence index plots of a multi-channel sequence object for first descriptive purposes. However, I am still not sure how to do that properly. The usual way of sorting one sequence object does not work well as there is no nested sorting function which sorts across multiple channels. The best way seems to me, to do a MDS after calculating multi-channel sequence distances using seqdistmc and sort all channels accordingly. This approach necessitates multiple decisions regarding the distance measure and so on, thus it almost goes beyond my intention of a first description.

  1. Would it somehow be possible to create a nested sort function for multi-channel sequence objects? Maybe by sorting first one channel from the beginning of sequences and then sorting the “ties”, equal sequences, by sorting the second channel and so on?
    Update: I found an answer to this part of the question using seqHMM, see below.
  2. What do you think, what would be the best way to plot and sort a multi-channel sequence object for description?

Here is some R syntax which may help to understand my problem

library(TraMineR)
library(TraMineRextras)

# Building sequence objects
data(biofam)

## Building one channel per type of event left, children or married
bf <- as.matrix(biofam[, 10:25])
children <-  bf==4 | bf==5 | bf==6
married <- bf == 2 | bf== 3 | bf==6
left <- bf==1 | bf==3 | bf==5 | bf==6

child.seq <- seqdef(children)
marr.seq <- seqdef(married)
left.seq <- seqdef(left)


# Create unsorted Sequence Index Plots
layout(matrix(c(1,2,3,4,4,4), 2, 3, byrow=TRUE), heights=c(3,1.5))
seqIplot(child.seq, title="Children", withlegend=FALSE)
seqIplot(marr.seq, title="Married", withlegend=FALSE)
seqIplot(left.seq, title="Left parents", withlegend=FALSE)
seqlegend(child.seq, horiz=TRUE, position="top", xpd=TRUE)


# Create sequence Index Plots sorted by alignment of first channel from beginning
mcsort.ch1 <- sortv(child.seq, start="beg")
layout(matrix(c(1,2,3,4,4,4), 2, 3, byrow=TRUE), heights=c(3,1.5))
seqIplot(child.seq, title="Children", sortv=mcsort.ch1, withlegend=FALSE)
seqIplot(marr.seq, title="Married", sortv=mcsort.ch1, withlegend=FALSE)
seqIplot(left.seq, title="Left parents", sortv=mcsort.ch1, withlegend=FALSE)
seqlegend(child.seq, horiz=TRUE, position="top", xpd=TRUE)

# Sequence Index Plots sorted by MDS scores of multi-channel distances

## Calculate multi-channel distances and MDS scores 
mcdist <- seqdistmc(channels=list(child.seq, marr.seq, left.seq),
                    method="OM", sm =list("TRATE", "TRATE", "TRATE"))
mcsort.mds <- cmdscale(mcdist, k=2, eig=TRUE)

## Create sequence Index Plots sorted by MDS scores of multi-channel distances
layout(matrix(c(1,2,3,4,4,4), 2, 3, byrow=TRUE), heights=c(3,1.5))
seqIplot(child.seq, title="Children", sortv=mcsort.mds$points[,1], withlegend=FALSE)
seqIplot(marr.seq, title="Married", sortv=mcsort.mds$points[,1], withlegend=FALSE)
seqIplot(left.seq, title="Left parents", sortv=mcsort.mds$points[,1], withlegend=FALSE)
seqlegend(child.seq, horiz=TRUE, position="top", xpd=TRUE)

Solution

  • I just stumbled upon the package seqHMM whose name suggests other purposes but which is able to sort multi-channel sequence objects. Thus, seqHMM is an answer to my first question. Here is an example code using seqHMM:

    library(TraMineR)
    library(seqHMM)
    
    # Building sequence objects
    data(biofam)
    
    ## Building one channel per type of event left, children or married
    bf <- as.matrix(biofam[, 10:25])
    children <-  bf==4 | bf==5 | bf==6
    married <- bf == 2 | bf== 3 | bf==6
    left <- bf==1 | bf==3 | bf==5 | bf==6
    
    child.seq <- seqdef(children)
    marr.seq <- seqdef(married)
    left.seq <- seqdef(left)
    
    mcplot <- ssp(list(child.seq, marr.seq, left.seq), 
                type = "I",title = "Sequence index plots",
                sortv = "from.start", sort.channel = 1,
                withlegend = FALSE, ylab.pos = c(1, 1.5, 1),
                ylab = c("Parenthood", "Marriage", "Residence"))
    plot(mcplot)
    

    You can find more about the abilities of the package in the following paper:
    Helske, Satu; Helske, Jouni (2016): Mixture Hidden Markov Models for Sequence Data: the seqHMM Package in R. Jyväskylä. Available online at https://cran.r-project.org/web/packages/seqHMM/vignettes/seqHMM.pdf.