I am trying to implement a rope tree as an alternative data structure for a string.
The wikipedia page is unclear about the rules for splitting the tree, but these rules seemed to work at first. However after a few split operations I ended up with an invalid tree:
6
/\
/ \
4 \
/\ \
/ \ \
/ 2 4
/ /\ /\
4 2 3 4 7
A B C D E
The numbers represent the weight of the nodes, and the length of the substring in case of a leaf. In this malformed tree, the substring C can never be reached.
Example of a good tree. Every character can be reached, following the explanation on wikipedia.
6
/\
/ \
/ 7
/ /\
4 3 \
/ \ / \ \
4 2 3 4 7
A B C D E
I have no CS background, so I don't know what is wrong with the tree. I don't even know how to express the problem with this tree properly. What is wrong with this tree (in CS terms) and how can I solve it?
The root violates the following invariant:
Each node's weight is the sum of all leafs' weights in its left subtree.
Your second tree fixes the invariant by changing the structure, but this is not necessary. Here's the corrected version using different weights with the same structure:
r: 9
/\
/ \
a: 4 \
/\ \
/ \ \
/ b: 2 4
/ /\ /\
4 2 3 4 7
A B C D E
To reach the first character in C
(the one at position 7 if we assume 1-based indexing), you'll run Index(r, 7)
according to the Wikipedia article. Here's the "Log":
Index(a, 7)
Index(b, 3)
Index(C, 1)
C
Addendum:
Note that the Wikipedia article proposes the following (different!) invariant:
Each node has a "weight" equal to the length of its string plus the sum of all the weights in its left subtree.
This formulation doesn't match the picture immediately above it:
According to the above invariant, node B
in the picture would have to have a weight of 15.