I keep on running into this situation where I have a trie branch, and I want to match in the middle of it downwards. So for example, I might have this trie branch sort of thing.
foo {
bar {
baz {
hello {
world {
123 {
456 {
abc {
xyz
}
}
}
}
}
}
}
}
This is a much shortened version of it. In reality it could be a binary trie with 100s of levels, such as 10101011011010100110000101......
, as in:
1 {
0 {
1 {
0 {
1 {
...
}
}
}
}
}
But in the simplified example with the string keys, the full path looks like this:
foo/bar/baz/hello/world/123/456/abc/xyz
It is common in tries to basically start from the top of the trie and move partially or all the way down. So you might find a match here, at a partial path.
foo/bar/baz/hello/world/123/
Or you might find one here:
foo/bar/baz/
That's easy with tries, you just start from the top and work your way down. What these all have in common is they start at the top of the branch.
But what I'm wondering is different. I am wondering how to start at the middle of the trie somewhere. So for example, I want to match like this:
/world/123/456/
Basically like a regular expression */world/123/456/*
, where it matches in the middle.
The thing is, if the trie is dense, then there could theoretically be thousands or even a million nodes scattered throughout the trie. So matching 5 layers down as in /world/123/456/
might mean scanning 1000's of upper-level trie nodes before we find the match.
I'm wondering what you do in this situation, what a possible solution is. All I can think of currently is somehow making that middle-of-the-branch its own top-level trie somewhere, duplicating the nested part of the trie in another place in memory. But this seems really inefficient and wasteful space and memory-wise, which is why I'm wondering how you might go about solving this.
Each node in the trie is technically still a trie. You can treat it as the root of that subtree.
You can exploit this by keeping a hash table that maps values of each node to the corresponding nodes in the trie. If nodes can have duplicate values then make each value map to a list of nodes.
If you need to search for a value in the middle of the trie you can use the hash table to immediately jump to the nodes in the trie that start with your starting value. Then for each of those nodes you can search for your value as if that node was the root of a top level trie somewhere.