Search code examples
rsequencetraminer

Visualizing sequence of states according to an incidence date using TraMineR


I am trying to generate a plot relating the ne2 sequence of states as it relates to an incidence date in ne3 (data below). I have data spanning a 11 year period from 2004-2015. The incidence date (ne3$date_inc) is also within this 11 year period, but these incidence dates are not equal for the different id´s. I´d like to have incidence date as the reference, so that the distribution of states before and after this incidence date for each id can be visualized using seqdplot where the x axis then has a mutual reference according to the incidence date (ie months before and after incidence date). However, referencing the state dates according to the incidence date as zero results in negative values for the states occurring before the incidence. Any idea if this can be done using TraMineR? Or other suggestions?

library(TraMineR)
ne2 <- structure(list(id = c(4885109L, 4885109L, 4885109L, 7673891L, 
    11453161L, 13785017L, 13785017L, 16400365L), status = structure(c(4L, 
    2L, 3L, 4L, 4L, 1L, 5L, 4L), .Label = c("A", "B", "C", "D", "E"
    ), class = "factor"), date_start = structure(c(12432, 15262, 
    15385, 12432, 12432, 12432, 14318, 12432), class = "Date"), date_end = structure(c(15262, 
    15385, 16450, 16450, 16450, 14318, 16450, 16450), class = "Date")), class = "data.frame", .Names = c("id", 
    "status", "date_start", "date_end"), row.names = c(NA, -8L))

ne3 <- structure(list(id = c(4885109L, 7673891L, 11453161L, 13785017L, 
        16400365L), date_inc = structure(c(15170, 13406, 13528, 13559, 
        15598), class = "Date")), .Names = c("id", "date_inc"), class = "data.frame", row.names = c(NA, 
        -5L))

Solution

  • Here is how you can make the sequences align on their incidence date.

    We start by transforming your SPELL data into the STS format used by TraMineR. Since sequences are longer than 100, we have to specify the max number of columns (limit) of the table that will store the sequences . So we first compute the max length of the sequences

    limit <- max(ne2$date_end) - min(ne2$date_start)
    

    Now we transform the SPELL data into the STS form

    ne2.sts <- seqformat(ne2, id='id', begin='date_start', end='date_end', status='status',
                         from='SPELL', to='STS', limit=as.numeric(limit), process=FALSE)
    
    dim(ne2.sts)
    ## [1]    5 4019
    

    Note that since the start and end dates are provided in data format, a daily time granularity is used. As a consequence we get very long sequences of 4019 days.

    Now, we need to shift the sequences to align their incidence date. This can be done with the seqstart function of TraMineRextras.

    The shift is the difference between the incidence date and its minimum. So we set the new start date as

    ne3$bd <- ne3$date_inc - min(ne3$date_inc) + min(ne2$date_start)
    

    We load TraMineRextras to gain access to seqstart

    library(TraMineRextras)
    

    We shift the sequences, create the state sequence object and plot it with seqdplot. We also define the x labels in number of days from the incidence date.

    ne2.sts.a <- seqstart(ne2.sts, data.start=min(ne2$date_start), new.start=ne3$bd)
    inc.pos <- as.numeric(ne3$date_inc[1] - ne3$bd[1])
    xtlab <- 1:ncol(ne2.sts.a) - inc.pos + 1
    ne2.a.seq <- seqdef(ne2.sts.a, xtstep=365, cnames=xtlab)
    seqdplot(ne2.a.seq, border=NA)
    

    chronogram of shifted sequences

    Note that due to the length of the sequences, it takes a few minutes to generate the plot. I would suggest using monthly data instead of daily data.