Search code examples
calgorithmred-black-tree

Fastest way to walk from node A to node B in a red-black tree


Say I want to walk from node #154 to node #254 inorder the fastest posible:

The best I can get is: get the first node in the range and walk inorder from this node:

/*
 * Look for a node matching a key range in tree.
 * Returns a pointer to the node if found, else NULL.
 */
struct rb_node *rb_range(struct rb_tree *tree, void *data, int (*func)(const void *, const void *))
{
    struct rb_node *node = rb_first(tree);
    int res;

    while (node != rb_nil(tree)) {
        if ((res = func(data, node->data)) == 0) {
            return node;
        }
        node = res < 0 ? node->left : node->right;
    }
    return NULL;
}

The comparison function looks like:

static int range_comp(const void *range, const void *data)
{
    const long ele = *(const long *)data;
    const long *arr = range;

    if ((ele >= arr[0]) && (ele <= arr[1])) {
        return 0;
    }
    if (ele > arr[0]) {
        return -1;
    } else {
        return +1;
    }
}

And the callback walking the result (there is a comment in order to see how many nodes have been sent to the callback):

static int print_data(void *data, void *cookie)
{
    long value = *(long *)data;
    const long *arr = cookie;

    if (cookie != NULL) {
        if (value > arr[1]) {
            return 1;
        }
        //  if (value >= arr[0]) {
            printf("%ld\n", value);
        //  }
    }
    return 0;
}

The inorder traversal:

/*
 * Call func() for each node, passing it the node data and a cookie;
 * If func() returns non-zero for a node, the traversal stops and the
 * error value is returned.  Returns 0 on successful traversal.
 */
int rb_apply_node(struct rb_tree *tree, struct rb_node *node,
    int (*func)(void *, void *), void *cookie, enum rb_traversal order)
{
    int error;

    if (node != rb_nil(tree)) {
        if (order == preorder)
            if ((error = func(node->data, cookie)) != 0)
                return error;
        if ((error = rb_apply_node(tree, node->left, func, cookie, order)) != 0)
            return error;
        if (order == inorder)
            if ((error = func(node->data, cookie)) != 0)
                return error;
        if ((error = rb_apply_node(tree, node->right, func, cookie, order)) != 0)
            return error;
        if (order == postorder)
            if ((error = func(node->data, cookie)) != 0)
                return error;
    }
    return 0;
}

    long x = rand() % n;
    printf("%d) Printing from %ld to %ld\n", i, x, x + 100);
    long arr[] = {x, x + 100};
    node = rb_range(tree, arr, range_comp);
    if (node != NULL) {
        /* Iterate from node A to node B INORDER */
        rb_apply_node(tree, node, print_data, arr, inorder);
    }

But I get too many results:

0 2 3 4 5 6 8 10 12 13 14 16 17 18 19 20 21 22 23 25 26 27 28 29 31 32 35 36 38 39 40 41 42 43 44 45 47 48 50 51 53 57 59 60 61 62 63 64 65 67 68 72 75 76 77 78 80 81 82 83 85 88 90 91 93 95 96 98 100 101 102 108 109 111 112 113 115 116 117 118 121 122 123 124 125 126 127 128 129 131 132 135 136 137 138 141 143 145 146 148 149 150 153 157 158 161 163 166 168 169 170 172 173 174 175 176 177 178 179 180 182 183 186 188 189 190 192 193 194 195 196 197 198 199 200 201 202 203 205 206 210 212 216 217 218 219 220 221 222 223 225 226 227 228 229 230 231 233 234 235 236 238 239 240 241 244 245 249 252 253 254

How can I restrict the result (as possible) to the selected range (154 to 254)?


Solution

  • First, just consider what you are trying to do and not how you have begun to try to solve it (which is wrong, as you assume incorrectly that next nodes are always in the right subtree, which is not true) To get some kind of iterator on your tree and find a record, then go next, and next, until you get to the other end.

    As the binary tree is not such that a kind of structure, following the simple tree exposed in other answer:

        4
       / \
      2   6
     / \ / \
    1  3 5  7
    

    suppose you have to go from node 3 to node 6. I'll try to deliver an algorithm that allows me to travel from one node to the next, by considering only the characteristics of the node I am on now. The next of node 3 is node 4 which requires us to climb left parents (a left parent will be a node for which we are a right child) until we get to a right parent (similarly, a right parent is a node for which we are a left child). This leads us to the next of a node that has no right subtree.

    Once on node 4 we have a right subtree, so the next node will be the first of this right subtree, which is found by following the left child of the right subtree until we don't get more left children. We then get to node 5.

    In node 5 we don't have right subtree, so we have to climb the tree up searching for the first right parent. That deals us to the node 6. After this, all nodes are greater than the limit, so we finish the search.

    In a structure where you have included provision for a parent node (in case you don't have it, you have to include a stack of nodes from the root of the tree to get the parent) you can have the following functions (they have not been tested as written here, this is only a hint, and has to be tested, which is left as an exercise to the reader)

    struct node {
        int key;
        struct node *parent, *left, *right;
        /* ... more stuff */
    };
    
    /* this routine returns the parent node if it is a left
     * parent, NULL otherwise (covers the no parent case for
     * the root node */
    struct node *left_parent(struct node *n)
    {
        if (!n->parent) return NULL; /* no parent */
        if (n->parent->left == n) return NULL; /* this is a right parent */
        return n->parent;
    }
    
    /* this routine returns the parent node if it is a right
     * parent, NULL otherwise (covers the no parent case for
     * the root node */
    struct node *right_parent(struct node *n)
    {
        if (!n->parent) return NULL;
        if (n->parent->right == n) return NULL; /* this is a left parent */
        return n->parent;
    }
    
    /* get the first key node of the tree rooted at n */
    struct node *first(struct node *n)
    {
        while (n->left) n = n->left;
        return n;  /* this is the first node of the subtree rooted at n */
    }
    
    /* get the last key node of the tree rooted at n */
    struct node *last(struct node *n)
    {
        while (n->right) n = n->right;
        return n; /* this is the last node of the subtree rooted at n */
    }
    
    /* finds the next node, with the algorithm described in the text */
    struct node *next(struct node *n)
    {
        if (n->right) /* we have subnodes */
            return first(n->right);
        else { /* on the parents */
            struct node *nxt;
            while ((nxt = left_parent(n)) != NULL) n = nxt;
            return n->parent; /* can be a right parent or NULL only */
        }
    }
    
    /* finds the prev node, with the algorithm described in the text */
    struct node *prev(struct node *n)
    {
        if (n->left) /* we have subnodes */
            return last(n->left);
        else {
            struct node *prv;
            while ((prv = right_parent(n)) != NULL) n = prv;
            return n->parent; /* can be a left parent or NULL only */
        }
    }
    

    and finally;

    int first, last;
    struct node *iterator, *root;
    ...
    for (iterator = find(root, first); 
         iterator && iterator.key <= last; 
         iterator = next(iterator))
    {
        do_something_to_iterator_node(iterator);
    }
    

    Well, this is not as fast as following a linked list, but almost, and you have the advantage of having the whole tree sorted.

    NOTE

    Just in case you want to save space and not include the parent pointer, the iterator converts from a single pointer to a stack of node pointers, from the one we are visiting now to the root of the tree, and we check for the left_parent and right_parent, by looking at the top node (or next to top, depending if you push the visited node on the stack or not) node.