If I understand correctly, modifying (insertion or deletion) a Binary Search Tree in Haskell requires copying the whole tree, so practically making it being O(n). Is there a way to implement it in O(log n) or maybe compiler would optimize O(n) insertion down to O(log n) "under the hood"?
If I understand correctly, modifying (insertion or deletion) a Binary Search Tree in Haskell requires copying the whole tree, so practically making it being O(n).
You do not need to copy the entire tree. Indeed, let us work with a simple unbalanced binary search tree, like:
data Tree a = Node (Tree a) a (Tree a) | Empty deriving (Eq, Show)
then we can insert a value with:
insertIn :: Ord a => a -> Tree a -> Tree a
insertIn x = go
where go Empty = Node Empty x Empty
go n@(Node l v r)
| x < v = Node (go l) v r
| x > v = Node l v (go r)
| otherwise = n
Here we reuse r
in case we construct a Node (go l) v r
, and we reuse l
in case we construct a Node l v (go r)
. For each node we visit, we create a new node where one of the two subtrees is used in the new node. This means that the new tree will point to the same subtree objects as the original tree.
In this example, the amount of new nodes thus scales with O(d) with d the depth of the tree. If the tree is fairly balanced, than it will insert in O(log n).
Of course you can improve the algorithm and define an AVL tree or red-black tree by storing more information in the node regarding balancing, in that case you thus can guarantee O(log n) insertion time.
The fact that all data is immutable here helps to reuse parts of the tree: we know that l
and r
can not change, so the two trees will share a large amount of nodes and thus reduce the amount of memory necessary if you want to use both the original and the new tree.
If there is no reference to the old tree necessary, the garbage collector will eventually collect the "old" nodes that have been replaced by the new tree.