string algorithm data-structures parallel-processing trie

Parallel algorithm for constructing a trie?

Because the trie data structure has such a huge branching factor and each subtree is totally independent of the others, it seems like there ought to be a way to enormously speed up trie construction for a given dictionary by adding all the words in parallel.

My initial idea on how to do this was the following: associate a mutex with each pointer in the trie (including the pointer to the root) and then have each thread follow the normal algorithm for inserting a word into the trie. However, before following any pointers, a thread must first acquire the lock on that pointer so that if it needs to add a new child node to the trie, it can do so without introducing any data races.

The catch with this approach is that it uses an enormous number of locks - one for each pointer in the trie - and does an enormous number of acquires and releases - one for each character in each input string.

Is there a way to build a trie in parallel without using nearly as many locks?

Solution

An obvious lock-free algorithm would be:

Bucket-sort the input strings by a length-k prefix (ordinarily, k=1, but with small alphabets, increase k).
For each letter, construct a trie containing the k-suffix of all the strings starting with that letter.
Merge the tries from the previous step (when k=1, just add a root node).

Assuming a uniform distribution of prefixes, this can give you a linear speedup up to the size of the alphabet to the power k.