Search code examples
rtraminer

How to add a regression line to a TraMineR entropy plot?


I'd like to know if it's somehow implemented to add a linear regression line to a TraMineR entropy plot.

Example:

library(TraMineR)
data(biofam)
set.seed(10)
biofam <- biofam[sample(nrow(biofam),300),]
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
                "Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)

seqplot(biofam.seq, type="Ht")

I want something like this:

enter image description here

I know that I can calculate the entropy with seqient()

biofam$ient <- seqient(biofam.seq, with.missing=TRUE)

But since the data is in wide format I'm not sure how to grab the data so that I could do an abline(lm(y ~ x)) approach.

> head(biofam[c(1, 10:12, 28)])
     idhous ... a15 a16 a17   ... Entropy
1234     NA   ... 0   0   0 ... 0.3010904
1515  86261   ... 0   0   0 ... 0.3154649
276   17561   ... 0   0   0 ... 0.4012026
1212  69021   ... 0   0   0 ... 0.5517478
153   11391   ... 0   0   0 ... 0.2559298
1164  66641   ... 0   0   0 ... 0.4178207

Also seqplot() does not seem to provide a numerical vector/matrix like plot():

p <- plot(sin, -pi, 2*pi, sub="ordinary plot")
s <- seqplot(biofam.seq, type="Ht")

> str(p)
List of 2
 $ x: num [1:101] -3.14 -3.05 -2.95 -2.86 -2.76 ...
 $ y: num [1:101] -1.22e-16 -9.41e-02 -1.87e-01 -2.79e-01 -3.68e-01 ...

> str(s)
 NULL

Solution

  • The cross-sectional entropy rendered by seqHtplot or seqplot(..., type="Ht") is stored in the $Entropy element of the object returned by seqstatd. The value at each time point is the entropy of the cross-sectional state distribution at this time point. So there is one value per time point.

    This is different form the longitudinal entropy returned by seqient. Here there is one value per sequence. The value is the entropy of the state distribution within the sequence.

    So, to plot the regression line you just need to extract the cross-sectional entropy using seqstatd, define the time variable, and use these variables in abline(lm()):

    stat.bf <- seqstatd(biofam.seq)
    ent <- stat.bf$Entropy
    time <- 1:length(ent)
    
    seqplot(biofam.seq, type = "Ht")
    abline(lm(ent ~ time), col = "red", lty = 2)
    

    enter image description here