Search code examples
algorithmdata-structuresarray-algorithms

Data Structures: If Heaps are trees, why are they implemented internally with Lists or arrays?


I'm giving myself a refresher course in data structures and algorithms (and learning new stuff -- I was an Information Systems major in college instead of Computer Science, so I didn't get a formal education in these things), and I've been working on heaps. I'm a little confused.

My understanding is that a Heap is basically a semi-sorted tree, where the value of each child node is guaranteed to be less than the value of its parent (assume for this discussion MinHeaps). So if it's a tree, why is it that every implementation I've seen has used an array-like structure internally instead of building a set of tree nodes?

It seems weird to me that I have to memorize that the children of N in an array sit at 2N + 1 (left) and 2N + 2 (right)*. Why not just build a node with Left and Right properties and go from there?

*Source: This article


Solution

  • TL;DR: Save on memory overhead, get more speed from data locality.

    For a binary tree you need in each node 4 bytes for left child and 4 byte for the right child (or 8+8 if you are on 64 bit system). That's just the bare pointers you need. If you store a 32 bit int that's a lot of overhead. Add another pointer for the parent that is needed for pushing a node towards the root and you are looking at 24 bytes of overhead for a 4 byte integer on a 64bit system.

    For a heap you don't need to worry about arbitrary trees. You usually only worry about the head (the min/max of the values) and you don't care about the internal structure. The heap is an almost complete binary tree (all levels are filled except the last one which is filled left to right). In this structure if you just put the nodes in an array, then for a node at index x you always find the parent at (x+1)/2 the left child at x*2+1 and the right child at x*2+2. So no need to store any of those fat pointers.

    In addition to the space saved you also get a speed boost because the memory is contiguous so it's more likely that it will be cached together (not guaranteed, just more likely).

    Sure, if it not something where efficiency is important, you can implement it as a regular tree. And conversely, if you have a almost complete tree and you want to the most out of you system implement it with an array (even if you don't use it as a heap).