Search code examples
pythonbinary-treepriority-queueheap

Tree-node based implementation (not dynamic array) of max heap in Python


I've seen some resources on implementation of a max-heap in python based on dynamic arrays and indexing. However, I haven't yet seen an OOP tree-node based implementation. I'm wondering if there is a reason for this; would a node based implementation have worse time/space complexity? Is this array implementation just so much more succinct that a tree-node implementation isn't worth the effort? Some other reason?


Solution

  • The reasons are the ones you list. A tree-node based algorithm is less efficient in both space and time.

    Space: An array implementation requires storage of just the data payload, and conceptual tree relationships are determined using trivial arithmetic operations. A tree-node implementation would require additional storage for the parent, left-child, and right-child. Win goes to the array.

    Time: A good heap implementation gains its efficiency by being a complete binary tree. Maintaining completeness is trivial for the array implementation. Let's focus on pushing a new value into an existing tree. When you push data to a zero-based array containing n values, it will be initially be conceptually placed at location n of the array and then bubble up as needed. The bubble up process involves conceptually sliding successive parent values downwards until there's no longer a violation of the heap property, then inserting the new value at that location. This is bounded by log(n), since the tree is complete, but can frequently terminate sooner. In contrast, completeness for a node-based tree is challenging. First you need to identify the node that will become the new node's parent, which requires traversing the tree or maintaining neighbor references for each node or maintaining a pointer to the parent of the nth location. Updating pointers across a sequence of pushes and pops involves additional work, and heaven help you if you screw it up. The bubbling up process has to do pointer updates on left, right, and parent for all traversed nodes, their parents, and their siblings. Once again, the win goes to the array implementation.

    Recycling array locations is easy — just put the new data in the next location for pushes and increment the count (letting python expand the array if needed), and decrement the count but leave the backing array as-is for pops. With a node-based tree, you either need to be constantly allocating/freeing nodes (computationally expensive), or store previously used nodes in an auxilliary container such as a stack to be recycled (additional storage).

    Bottom line: Node based implementations are possible, but they're not worth it.