I'm searching for an algorithm that solves the following problem: Say I've got a collection of ordered event sequences and want to create table from them
AB
BC
leading to a table of the form
ABC
**
**
the first line being the headers. For each input sequence I want a row with markers in the columns of the events that occured.
Another more complex example (with three sequences) would be:
AAB
BBA
CBA
leading to
CBBAAB
***
***
** *
I know that sometimes there are multiple possible solutions (e.g. the trivial example of two sequences consisting only one event each I'm free to decide which comes first). I'm only interested in any solution, the resulting header sequence (CBBAAB
in the last example) should be as short as possible.
Doe anyone know a algorithm that solves that problem?
I believe it could be solved in a much simpler way. The problem to me seems like finding first the shortest common supersequence and then to find markers - compute the common subsequence of each string with supersequence.
For example:
AAB
BBA
CBA
Shortest common supersequence between AAB, BBA
is BBAAB
or AABBA
then shortest supersequence between BBAAB
and CBA
is CBBAAB
or between AABBA
and CBA
is CAABBA
Now to find markers for AAB
, find the common subsequence between CBBAAB
and AAB
.
similarly for BBA
and CBBAAB
and as well for CBA
, CBBAAB
Here are some links to help with finding them: