Search code examples
rtraminer

Formatting timestamps to avoid R/TraMineR crash?


I have a sequence dataset where the timestamp is in seconds since the epoch:

id      event       time        end
1  723     opened 1356963741 1356963741
2  722     opened 1356931342 1356931342
3  721 referenced 1356988206 1356988206
4  721 referenced 1356988186 1356988186
5  721     closed 1356988186 1356988186
6  721     merged 1356988186 1356988186
7  721     closed 1356988186 1356988186
8  721     merged 1356988186 1356988186
9  721  discussed 1356966433 1356966433
10 721  discussed 1356963870 1356963870

I want to create an STS sequence object:

sequences.sts <- seqformat(data, from="SPELL", to="STS", 
     begin="time", end="end", id="id", status="event", limit=slmax)
sequences.sts <- seqdef(sequences.sts)
summary(sequences.sts)

However, when I do this, RStudio crashes, and more or less freeze up my entire computer. Through comparing with other code, which runs fine, that uses single-digit numbers for the "time" column, I think I have identified the problem as being the timestamp. Could it be that R/RStudio/TraMineR simply gets overloaded from the long timestamp?


Solution

  • I cannot reproduce the problem, but the most probable reason is that it creates very long sequences. Sequence 721 lasts for 24'336 seconds. In other words we should create a sequence of length 24'336. Depending on the number of sequences and the other sequences, it will be very long to compute.

    The problem is that we use the time unit of your timestamp (seconds). You can try to use another time unit, possibly aggregating events occuring at the same time unit.

    Hope this helps.