Search code examples
sequencesequencestraminercomplex-event-processing

how to represent multichannel event sequences


I'm trying to use TraMineR but am open to feedback/references/links to more info as to how to represent multi-channel or hierarchical event sequences and algorithms that deal with it.

I have a complex event structure that I'm trying to figure out how to represent as a sequence. There are different types of events. Each event type may have a different set of fields (and different numbers of fields). For instance, age might be a field in one event type whereas height might be a field in another event type. My first instinct (and I believe a common approach) was to “flatten” everything, e.g. every possible combination of values for an event constitutes a unique event type. However, this may miss patterns in the generic event types.

For example, let's say I'm a dog breeder and drink a lot of coffee and I want to see if there are patterns in my coffee/dog buying habits (yes, silly example). I might have events like:

- Bought dog
- Breed: hound
- Sex: female

- Bought coffee
- Store: Starbucks
- Roast: dark

- Bought dog
- Breed: hound
- Sex: female

- Bought coffee 
- Store: Starbucks
- Roast: light

- Bought dog
- Breed: Doberman pincher
- Sex: male

To flatten the data I may say that every unique combination of store and roast is a unique coffee buying event. Also, every unique combination of breed and sex is a unique dog buying event. This approach would turn the example above into 5 different event types (rather than 2 event types with fields). This representation could detect patterns such as the following: if I drink 2 dark roast coffees from Starbucks then I am more likely to by a male Doberman pincher.

However, this representation may miss more general patterns that don't depend on field values in the events. For instance, it may be the case that I simply buy a dog after having two coffees in general.

I'd like to be able to detect patterns at both "levels" and am unsure of how to represent the events to do so. Of course one approach would be to use both representations and then just combine the results of the two.

So, questions are: 1. Any links/citations to papers that deal with this? 2. Is this a common issue? 3. Any recommendations on how to represent these events? 4. Any recommendations on how to work with them in TraMineR 5. Any recommendations / links / references to algorithms that deal with this sort of thing? 6. Any ideas at all?

Thanks!!!


Solution

  • This is actually similar to the question asked here (although they did not know to reference "multi-channel" and the title was vague): Multiple events in traminer

    TraMineR has support for dealing with multichannel sequences with functions like: seqdistmc

    The general approach, I believe, is to do exactly what I outlined as our "flatten" solution. In this case you combine the values for each channel into one event type. e.g. in my example dog.hound.female would be one event with one channel/field to replace the first event in my example that has 3 separate fields/channels. You then use the typical functions for finding distances, subsequences, etc. You do have options for setting up substitution costs and finding distances though, so it has some extra options for doing this multi-channel approach. It also deals with missing values in case you have channels that are different length or have gaps.

    This is also similar to what's suggested in the answer to the topic linked above, using the native R function interaction.