I am trying to implement some data structures as classes in Python (linked lists, binary trees, tries, etc). For some of these structures, I can either try to implement them as dicts-of-dicts (e.g. the trie has nested layers for its children), or I can implement them as having a "next" variable that contains another instantiation of the same object.
I am wondering what the pros and cons are for a recursive data structure vs storing all of the child data in a member variable. Are there speed or memory benefits? Better caching? Readability?
Below is some example code illustrating what I am talking about, but I am more interested in the theoretical benefits than a critique of my pseudocode :)
class Trie:
def __init__(self) -> None:
self._trie = {}
def insert(self, text: str) -> None:
trie = self._trie
for char in text:
if char not in trie:
trie[char] = {}
trie = trie[char]
trie["END"] = True
def search(self, prefix: str) -> bool:
trie = self._trie
for char in prefix:
if char not in trie:
return False
trie = trie[char]
return True
class RecursiveTrie:
def __init__(self) -> None:
self.children: List[Optional[RecursiveTrie]] = [None] * 26
self.end_of_word = False
def insert(self, key: str) -> None:
"""
If not present, inserts key into trie.
If the key is prefix of trie node, just marks leaf node.
"""
if not key:
self.end_of_word = True
return
index = self.char_to_index(key[0])
if self.children[index] is None:
self.children[index] = RecursiveTrie()
child = self.children[index]
child.insert(key[1:])
def search(self, key: str) -> bool:
""" Search key in the trie. Returns true if key is present in trie. """
if not key:
return True
index = self.char_to_index(key[0])
if self.children[index] is None:
return False
return self.children[index].search(key[1:])
Which of the two approaches IMO it really depends on if the code you want to write is real production code or an exercise.
For production code you should go for the first version, it is smaller and easier to read. It treats dict
as a basic brick (as it is in Python) and nested dicts are no problem at all (consider that even each object attributes in python are in a dict
... using member pointing to another instance for not using a dict doesn't seem at a first sight a good idea).
For understanding tries the second version doesn't depend on dictionaries (explicitly) and it's how you'd write it in C or even C++ where the array could make sense compared to std::map
(it should be proved with profiling on real data and the real production hardware, today CPUs are way too complex to be able to predict performance of complex algorithms).
That knowledge will be useful if you will ever need to implement a trie or a variation on the trie in the future in a low-level language.
The second version in a production python software is just making life harder for colleagues and/or for yourself in the future.