Search code examples
c++performancerecursiontreedepth-first-search

C++ Tree traversal: Why Looping is slower than Recursion in this case?


I need to traverse a tree in my C++ code for a large number of times, the depth of the tree can vary from one iteration to another. I might also conditionally early break from the tree traversal. While profiling my code (using Visual studio compiler), I noticed that the tree traversal part was the biggest bottleneck in my code so I need to speed that part as much as I can.

Below is the description and a simplified runnable version of my code to showcase the issue I currently have.

While using recursion, I noticed that I might be able to speed up my code by conditionally early-breaking from the recursion. However, my implementation of the early-break didn't improve the speed at all (see the code). I thought by using a loop instead of recursion, early-breaking will be easier to implement, so I translated my tree traversal to a loop. Surprisingly, the loop version was an order of magnitude slower than the recursion version! Also, early-break did only improve the speed by 10% at most which is surprising because this is a depth first search traversal, when the break happens, a large part of the tree is not traversed. Hence, at least a speed-up of 50-100% is expected.

My questions:

  1. Why in the specific case below the looped version is an order of magnitude slower?!
  2. Why early break doesn't improve speed much (both for the loop and the recursion)
  3. Any other performance tips for the case below is deeply appreciated.
#include <iostream>
#include <vector>
#include <stack>
#include <chrono>

using namespace std;
using namespace std::chrono;

class Node {
public:
    int id;
    int left = -1;
    int right = -1;
    int count = 0;

    Node(int _id) { id = _id; }
};
std::vector<Node> nodes;

//1) recursive tree traversal
void recursive(int node) {
    if (nodes[node].left == -1) {
        nodes[node].count++;
    }
    else {
        recursive(nodes[node].right);
        recursive(nodes[node].left);
    }
}

//2) recursive tree traversal with conditional break
void recursive2(int node, bool* stop) {
    if (*stop == false) {
        if (nodes[node].left == -1) {
            nodes[node].count++;
            if (rand() % 2 == 0) { *stop = true; } //conditional break
        }
        else {
            recursive2(nodes[node].right, stop);
            if (*stop == false) {
                recursive2(nodes[node].left, stop);
            }
        }
    }
}

// loop traversal
void loop(int node) {
    stack<int> stack;
    stack.push(node);
    while (stack.size() > 0) {
        node = stack.top();
        stack.pop();
        if (nodes[node].left == -1) {
            nodes[node].count++;
            //if (rand() % 2 == 0) { break; } // conditional break
        }
        else {
            stack.push(nodes[node].right);
            stack.push(nodes[node].left);
        }
    }
}


int main()
{
    for (int i = 0; i < 7; i++) {
        nodes.push_back(Node(i));
    }
    // make a simple tree /node 6 is the root
    nodes[4].left = nodes[0].id;
    nodes[4].right = nodes[1].id;
    nodes[5].left = nodes[2].id;
    nodes[5].right = nodes[3].id;
    nodes[6].left = nodes[4].id;
    nodes[6].right = nodes[5].id;


    /// speed comparison 
    int n = 10000000;
    int root_node = 6;

    auto start = high_resolution_clock::now();
    for (int i = 0; i < n; i++) { recursive(root_node); }
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<milliseconds>(stop - start);
    cout << "recursion:" << duration.count() << endl;

    start = high_resolution_clock::now();
    for (int i = 0; i < n; i++) {
        bool stop = false;
        recursive2(root_node, &stop);
    }
    stop = high_resolution_clock::now();
    duration = duration_cast<milliseconds>(stop - start);
    cout << "recursion with early-break:" << duration.count() << endl;

    start = high_resolution_clock::now();
    for (int i = 0; i < n; i++) { loop(root_node); }
    stop = high_resolution_clock::now();
    duration = duration_cast<milliseconds>(stop - start);
    cout << "loop:" << duration.count() << endl;
}

Solution

  • The tree you are traversing is so small compared to the number of iterations you are running that managing the dynamic memory of the stack object dwarves any gains you think (we'll revisit that later) you are getting from doing it in a loop.

    Let's try and remove that memory allocation overhead and see what happens to the speeds.

    For reference, this is what I get out of your code as posted on my local Visual Studio x64-Release build.

    recursion:60
    recursion with early-break:70
    loop:2088
    

    First off, std::stack uses std::deque, which is not great for small stacks. Switching to a vector-backed stack should makes things better. Since it's a one-line change, there's no reason to at least try it:

    std::stack<int, std::vector<int>> stack;
    
    recursion:58
    recursion with early-break:68
    loop:1853
    

    And it does! It's not earth-shattering, but it's a good sign that we're on the right track. If we were worried about the time it takes to do one big traversal, that would probably be good enough. But what we care about is doing 10000000 tiny traversals, so we need to go further than that: Get rid of the memory allocation altogether:

    // I "could" allocate the data as a local variable, 
    // but then it would be on the stack, completely defeating the purpose.
    // This would normally be a recyclable heap-based chunk of memory passed to
    // the function.
    std::array<int, 50> g_stack; // just for the example, don't actually do this.
    void loop(int node) {
      auto top = g_stack.begin();
      auto push = [&](int v) {*(top++) = v;};
      auto pop = [&]() {--top;};
    
      push(node);
      while (top != g_stack.begin()) {
        node = *(top-1);
        pop();
        if (nodes[node].left == -1) {
          nodes[node].count++;
        }
        else {
          push(nodes[node].right);
          push(nodes[node].left);
        }
      }
    }
    
    recursion:61
    recursion with early-break:68
    loop:65
    

    Now we're talking! But it still doesn't beat recursion. What's going on here?

    The premise that a loop is always faster than using recursion is simply not universally true.

    The primary advantage of a loop-based approach over recursion is not speed, it's that it addresses the biggest problem of recursion: the possibility of running out of stack space. By using a loop, you are capable of "recursing" much, much deeper than recursive function calls, because the stack of operation lives on the heap, which typically has a lot more room available.

    Sometimes a compiler does a better job with loop code than with recursion, resulting in a faster runtime, but it's never a given.