Search code examples
data-structurestreebinary-treemultiway-tree

What is the left-child, right-sibling representation of a tree? Why would you use it?


Many data structures store multi-way trees as binary trees using a representation called the "left-child, right-sibling" representation. What does this mean? Why would you use it?


Solution

  • The left-child, right-sibling representation (LCRS) is a way of encoding a multi-way tree (a tree structure in which each node can have any number of children) using a binary tree (a tree structure in which each node can have at most two children).

    Motivation

    To motivate how this representation works, let's begin by considering a simple multi-way tree, like this one here:

                       A
                     //|\ \
                   / / | \  \
                  B C  D  E  F
                  |   /|\   / \
                  G  H I J K   L
    

    (Apologies for the low-quality ASCII artwork!)

    In this tree structure, we can navigate downward from any node in the tree to any of its children. For example, we can migrate from A to B, A to C, A to D, etc.

    If we wanted to represent a node in a tree like this one, we would normally use some sort of node structure / node class like this one here (written in C++):

    struct Node {
        DataType value;
        std::vector<Node*> children;
    };
    

    In the LCRS representation, we represent the multi-way tree in a way where each node requires at most two pointers. To do this, we will slightly reshape the tree. Rather than having each node in the tree store pointers to all of its children, we will structure the tree in a slightly different way by having each node store a pointer to just one of its children (in LCRS, the leftmost child). We will then have each node store a pointer to its right sibling, the next node in the tree that is the child of the same parent node. In the case of the above tree, we might represent the tree in the following way:

                A
               /
              /
             /
            B -> C -> D -> E -> F
           /         /         /
          G         H->I->J   K->L
    

    Notice that in this new structure it is still possible to navigate from a parent node to its kth child (zero-indexed). The procedure for doing so is the following:

    • Descend into the left child of the current node. (This is the first node in the list of its children).
    • Follow that child's right sibling pointer k times. (This takes you to the kth child of the node).
    • Return that node.

    For example, to find the third (zero-indexed child) of the root node A, we would descend to the leftmost child (B), then follow three right links (visiting B, C, D, and finally E). We then arrive at the node for E, which holds the third child of node A.

    The main reason for representing the tree this way is that even though any node may have any number of children, the representation requires at most two pointers for each node: one node to store the leftmost child, and one pointer to store the right sibling. As a result, we can store the multiway tree using a much simpler node structure:

    struct Node {
        DataType data;
        Node* leftChild;
        Node* rightSibling;
    };
    

    This node structure has exactly the same shape of a node in a binary tree (data plus two pointers). As a result, the LCRS representation makes it possible to represent an arbitrary multiway tree using just two pointers per node.

    Analysis

    Let us now look at the time and space complexity of the two different representations of the multiway tree and some basic operations on it.

    Let's begin by looking at the total space usage required for the initial "dynamic array of children" representation. How much total memory usage will there be for an n-node tree? Well, we know the following:

    1. Each node, regardless of its number of children, contributes space for the raw data stored (sizeof(data)) and the space overhead of the dynamic array. The dynamic array (typically) has one pointer stored (which points at the allocated array), one machine word for the total number of elements in the dynamic array, and (optionally) one machine word for the allocated capacity of the array. This means that each node takes up sizeof(Data) + sizeof(Node *) + 2 * sizeof(machine word) space.

    2. Across all of the dynamic arrays in the tree, there will be n - 1 pointers to children, since of the n nodes in the tree n - 1 of them have parents. That adds in an extra (n - 1) * sizeof(Node *) factor.

    Therefore, the total space usage is

    n · (sizeof(Data) + sizeof(Node*) + 2 * sizeof(machine word)) + (n - 1) * sizeof(Node *)

    = n * sizeof(Data) + (2n - 1) * sizeof(Node*) + 2n * sizeof(machine word)

    Now, let's contrast that with an LCRS tree. Each node in an LCRS tree stores two pointers (2 * sizeof(Node*)) and one data element (sizeof(Data)), so its total space is

    n * sizeof(Data) + 2n * sizeof(Node *)

    And here we see the win: notice that we are not storing 2n * sizeof(machine word) extra memory to keep track of allocated array sizes. This means that the LCRS representation uses considerably less memory than a regular multiway tree.

    However, basic operations on the LCRS tree structure tend to take longer than their corresponding operations on the normal multi-way tree. Specifically, in a multi-way tree represented in the standard form (each node stores an array of child pointers), the time required to navigate from one node to its kth child is given by the time required to follow a single pointers. On the other hand, in the LCRS representation, the time required to do this is given by the time required to follow k + 1 pointers (one left child pointer, then k right child pointers). As a result, if the tree has a large branching factor, it can be much slower to do lookups in an LCRS tree structure than in the corresponding multiway tree structure.

    We can therefore think of the LCRS representation as offering a time-space tradeoff between data structure storage space and access times. The LCRS representation has less memory overhead than the original multiway tree, while the multiway tree gives constant-time lookups of each of its children.

    Use Cases

    Because of the time-space tradeoff involved in the LCRS representation, the LCRS representation is typically not used unless one of two criteria hold:

    1. Memory is extremely scarce, or
    2. Random access of a node's children is not required.

    Case (1) might arise if you needed to store a staggeringly huge multiway tree in main memory. For example, if you needed to store a phylogenetic tree containing many different species subject to frequent updates, then the LCRS representation might be appropriate.

    Case (2) arises in specialized data structures in which the tree structure is being used in very specific ways. For example, many types of heap data structures that use multiway trees can be space optimized by using the LCRS representation. The main reason for this is that in heap data structures, the most common operations tend to be

    1. Remove the root of a tree and process each of its children, or
    2. Join two trees together by making one tree a child of the other.

    Operation (1) can be done very efficiently in the LCRS representation. In an LCRS representation, the root of the tree never has a right child (since it has no siblings), and therefore removing the root simply means peeling off the root node and descending into its left subtree. From there, processing each child can be done by simply walking down the right spine of the remaining tree and processing each node in turn.

    Operation (2) can be done very efficiently as well. Recall from above that in an LCRS representation, the root of a tree never has a right child. Therefore, it is very easy to join together two trees in LCRS representation as follows. Beginning with two trees like this:

        R1             R2
       /               /
     (children 1)    (children 2)
    

    We can fuse the trees together in this way:

                 R1
                /
               R2
              /  \
    (children 2) (children 1)
    

    This can be done in O(1) time, and is quite simple. The fact that the LCRS representation has this property means that many types of heap priority queues, such as the binomial heap or rank-pairing heap are typically represented as LCRS trees.

    Hope this helps!