Search code examples
rtraminersequence-analysis

Format of output for seqecmpgroup() function?


The seqecmpgroup() function returns a table that, among other things, include frequencies for each of the specified groups. However, when I run this it generates frequencies below 1 (e.g. 0.00035). Should I interpret these frequencies as percentages showing in how many of the groups that each subsequence occurs?

Below I've pasted an example output (the frequencies for each group are listed as "Freq.1", "Freq.2", etc.:

      Subsequence     Support     p.value statistic index      Freq.1
1      (FA)-(IN)-(FA) 0.004807692 0.002293660 12.155213   538 0.000000000
2 (NR)-(TR)-(EX)-(IN) 0.004807692 0.002293660 12.155213   685 0.000000000
3 (NR)-(TR)-(IN)-(IN) 0.004807692 0.002293660 12.155213   687 0.000000000
4      (IS)-(IS)-(NR) 0.019230769 0.006788125  9.985161    98 0.040322581
5      (FA)-(NR)-(QU) 0.012820513 0.009031434  9.414088   172 0.008064516
       Freq.2     Freq.3    Resid.1   Resid.2   Resid.3
1 0.000000000 0.02419355 -1.0919284 -1.100699  3.113347
2 0.000000000 0.02419355 -1.0919284 -1.100699  3.113347
3 0.000000000 0.02419355 -1.0919284 -1.100699  3.113347
4 0.007936508 0.00000000  2.3951978 -1.292885 -1.544220
5 0.003968254 0.04032258 -0.6614769 -1.241085  2.704727

Computed on 624 event sequences
  Constraint Value
  countMethod  COBJ

Solution

  • The frequencies are actually relative frequencies. They correspond to the relative support within each group, that is, they indicate for each group the proportion of sequences in the group that contain the subsequence.

    For example, we learn from your outcome that the first subsequence (FA)-(IN)-(FA) does never occur in the first two groups, and is a subsequence of 2.4% of the sequences of the 3rd group.

    The proportions account for sequence weights when provided.

    Now, I do not see any negative frequencies in your example output. And the value you indicate of 0.00035 IS NOT below 0!