I am trying to generate a plot relating the ne2
sequence of states as it relates to an incidence date in ne3
(data below). I have data spanning a 11 year period from 2004-2015. The incidence date (ne3$date_inc
) is also within this 11 year period, but these incidence dates are not equal for the different id´s. I´d like to have incidence date as the reference, so that the distribution of states before and after this incidence date for each id can be visualized using seqdplot
where the x axis then has a mutual reference according to the incidence date (ie months before and after incidence date). However, referencing the state dates according to the incidence date as zero results in negative values for the states occurring before the incidence. Any idea if this can be done using TraMineR
? Or other suggestions?
library(TraMineR)
ne2 <- structure(list(id = c(4885109L, 4885109L, 4885109L, 7673891L,
11453161L, 13785017L, 13785017L, 16400365L), status = structure(c(4L,
2L, 3L, 4L, 4L, 1L, 5L, 4L), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), date_start = structure(c(12432, 15262,
15385, 12432, 12432, 12432, 14318, 12432), class = "Date"), date_end = structure(c(15262,
15385, 16450, 16450, 16450, 14318, 16450, 16450), class = "Date")), class = "data.frame", .Names = c("id",
"status", "date_start", "date_end"), row.names = c(NA, -8L))
ne3 <- structure(list(id = c(4885109L, 7673891L, 11453161L, 13785017L,
16400365L), date_inc = structure(c(15170, 13406, 13528, 13559,
15598), class = "Date")), .Names = c("id", "date_inc"), class = "data.frame", row.names = c(NA,
-5L))
Here is how you can make the sequences align on their incidence date.
We start by transforming your SPELL data into the STS format used by TraMineR
. Since sequences are longer than 100, we have to specify the max number of columns (limit
) of the table that will store the sequences .
So we first compute the max length of the sequences
limit <- max(ne2$date_end) - min(ne2$date_start)
Now we transform the SPELL data into the STS form
ne2.sts <- seqformat(ne2, id='id', begin='date_start', end='date_end', status='status',
from='SPELL', to='STS', limit=as.numeric(limit), process=FALSE)
dim(ne2.sts)
## [1] 5 4019
Note that since the start and end dates are provided in data format, a daily time granularity is used. As a consequence we get very long sequences of 4019 days.
Now, we need to shift the sequences to align their incidence date. This can be done with the seqstart
function of TraMineRextras
.
The shift is the difference between the incidence date and its minimum. So we set the new start date as
ne3$bd <- ne3$date_inc - min(ne3$date_inc) + min(ne2$date_start)
We load TraMineRextras
to gain access to seqstart
library(TraMineRextras)
We shift the sequences, create the state sequence object and plot it with seqdplot
. We also define the x labels in number of days from the incidence date.
ne2.sts.a <- seqstart(ne2.sts, data.start=min(ne2$date_start), new.start=ne3$bd)
inc.pos <- as.numeric(ne3$date_inc[1] - ne3$bd[1])
xtlab <- 1:ncol(ne2.sts.a) - inc.pos + 1
ne2.a.seq <- seqdef(ne2.sts.a, xtstep=365, cnames=xtlab)
seqdplot(ne2.a.seq, border=NA)
Note that due to the length of the sequences, it takes a few minutes to generate the plot. I would suggest using monthly data instead of daily data.