Search code examples
javaperformanceheapcpu-cache

Why is a linear search through a heap data structure faster than tree traversal?


In the following code, indexOfSlow is around 17 times slower than indexOf in my test cases, although the former does fewer comparison than the latter.

Why is that?

(My hunch is that the superior performance of indexOf is due to sequential array access which has e.g. caching benefits - but I'd like to know for sure)

Performance numbers from my machine:

| elementCount | indexOf msec | indexOfSlow msec | slow/fast |
|--------------+--------------+------------------+-----------|
|        10000 |           41 |              852 |        21 |
|        20000 |          232 |             3387 |        15 |
|        30000 |          375 |             6642 |        18 |
|        40000 |          710 |            12699 |        18 |
|        50000 |         1197 |            19182 |        16 |
|--------------+--------------+------------------+-----------|
|      average |              |                  |      17.6 |

N.B. Java's PriorityQueue.indexOf() is similar to my indexOf, so it does not try to walk the tree and stop early like my indexOfSlow does.

class Heap<T extends Comparable<T>> {

    public static void main(String[] args) {
        int elementCount = 50_000;

        final List<Integer> elements = new ArrayList<>(elementCount);
        for (int i = 0; i < elementCount; i++)
            elements.add(i);

        for (int j = 0; j < 3; j++) {
            final var heap = new Heap<Integer>();
            for (int i : elements)
                heap.add(i);
            assert heap.peek() == 0;
            final long nanoBefore = System.nanoTime();
            for (int i = elementCount; i > 0; i--)
                heap.indexOf(i - 1);
//                heap.indexOfSlow(i - 1);
            final long nanoAfter = System.nanoTime();
            if (j > 0) // discard first round as warmup
                System.out.println("Time taken: " + (nanoAfter - nanoBefore) / 1_000_000 + " msec");
        }
    }

    private final ArrayList<T> list = new ArrayList<>();

    public T peek() {
        return list.isEmpty() ? null : list.get(0);
    }

    private void siftUp(int i, T value) {
        while (i > 0) {
            int parentI = (i - 1) / 2;
            final T parentValue = list.get(parentI);
            if (parentValue.compareTo(value) <= 0)
                return;
            list.set(parentI, value);
            list.set(i, parentValue);
            i = parentI;
        }
    }

    public void add(T value) {
        list.add(value);
        siftUp(list.size() - 1, value);
    }

    public int indexOf(T value) {
        final int size = list.size();
        if (size > 0) {
            for (int i = 0; i < list.size(); i++) {
                if (value.compareTo(list.get(i)) == 0)
                    return i;
            }
        }
        return -1;
    }

    public int indexOfSlow(T value) {
        final int size = list.size();
        if (size > 0) {
            Queue<Integer> childrenToVisit = new LinkedList<>();
            childrenToVisit.add(0);
            while (!childrenToVisit.isEmpty()) {
                int i = childrenToVisit.poll();
                final int cmp = list.get(i).compareTo(value);
                if (cmp == 0)
                    return i;
                if (cmp > 0)
                    continue;
                int rightChildIdx = (i + 1) * 2;
                int leftChildIdx = rightChildIdx - 1;
                if (leftChildIdx < size)
                    childrenToVisit.add(leftChildIdx);
                if (rightChildIdx < size)
                    childrenToVisit.add(rightChildIdx);
            }
        }
        return -1;
    }

}

Solution

  • First of all, the results obtained with

    final long nanoBefore = System.nanoTime();
    //...
    final long nanoAfter = System.nanoTime();
    

    are not reliable at all. To get reliable results you need to use JMH.

    Second, in indexOfSlow() you use LinkedList which is highly ineffective.

    Third, in indexOf() you don't allocate any memory, iterating over the same list again and again. In indexOfSlow() you allocate a new LinkedList for each method invocation. This obviously costs us time and GC.

    Fourth, in spite of the fact that in theory linear search has complexity O(n) in practice in your example you are searching over ArrayList which is array-based and extremely fast when you are accessing elements by index (even if you need to iterate over the whole list). In the indexOfSlow() on the contrary you rely on continuous poll()/add() operations. Imagine you are looking for 49733, then for the 1, 2, 3 etc. values you execute both poll() and add()operation. As a result before returning the value you are looking for your childrenToVisit list is modified 99734 (!!!) times. So this algorithm is extremely ineffectively implemented and this is why you have such a dramatic difference in numbers.