Usually in Burrows-Wheeler Transform algorithm, a $ character is used to signal the end of string, but in so many cases, this $ is omitted.
I was wondering how it can be reversed without knowing the position of the last character?
For example, I have this BWT:
[[[[[1[[11endgnad1234245ndbnbbb]]]]]]]nnnngnabbbdiaaaiaaii
Following the algorithm, I can easily construct the first column of the BWT matrix, which I choose to represent in a compressed way such as below:
Character : Occurrences
1 : 4
2 : 2
3 : 1
4 : 2
5 : 1
[ : 7
] : 7
a : 7
b : 7
d : 4
e : 1
g : 2
i : 4
n : 9
Without knowing which character is the last in the original string, I'm unable to see how I could reconstruct the original string.
Any help is greatly appreciated. Thang
P/S: In case you're wondering what the original string is:
[1]ban[2]banana[3]band[4]bandage[12]bin[14]bind[15]binding
You can't (but you can try ;-). Your 1st bwt symbol is the last in the original string 'S'. Now you should unroll the original string backward through LF mapping. It's actually bin[sym] + rank(sym, i) + 1 where you start with i = 0. You can easy get bin[] array from occurences. The problem is that once your 'i' is bigger then omitted '$' you shouldn't add this last '1' so you break the string and things go nasty. You can detect the error if you also reconstruct sa[] and overwrite already set index. So you can set arbitrary $ position to '0' and try to recover, then if it fails set it to 1... until you reconstruct correctly. don't know if this could be optimized.
Cheers,
D.