Search code examples
javascripttypescriptalgorithmbinary-search-treedepth-first-search

Why does performing DFS with this code result in duplicate leaves?


I am writing an algorithm that discerns whether two trees have the same leaves.

These have the same leaf numbers in the same order so this returns true

These have the same leaf numbers in the same order so this returns true.

This is the code I wrote:

function leafSimilar(root1: TreeNode | null, root2: TreeNode | null): boolean {

    console.log(DFS(root1))

    const leavesRoot1 = DFS(root1);
    const leavesRoot2 = DFS(root2);

    for (let i = 0; i < Math.max(leavesRoot1.length, leavesRoot2.length); i += 1) {
        if (leavesRoot1[i] !== leavesRoot2[i]) {
            return false;
        }
    }

    return true;
};

function DFS(root, leaves = [] ) {

    if(!root) return leaves; 

    if (!root.left && !root.right) {
        leaves.push(root.val);
        return leaves;
    }

    // return DFS(root.left).concat(DFS(root.right)); // this is the correct answer

    return DFS(root.left, leaves).concat(DFS(root.right, leaves)); // why doesn't this work?
}

The last line in the code is what i initially thought, but it is wrong.

I couldn't draw it in my mind so I logged them as seen in the second line.

It logs:

[
  6, 7, 4, 
  6, 7, 4, 
  6, 7, 4, 
  6, 7, 4, 9, 8,
  6, 7, 4, 9, 8
] 

After 2 hours, I cannot figure out why this is.

I would think it should be at least something similar to this:

[6, 
 6, 7, 
 6, 7, 4, 
 6, 7, 4, 9,
 6, 7, 4, 9, 8,
]

or just

[6,7,4,9,8]

which is the correct one.

Would someone be able to explain to me why?

The DFS function takes leaves array argument from the previous call.

This means that it should receive leaves from the node above, not below, so there shouldn't be a repeat pattern in the leaves array because it is empty.

The DFS is in preorder according to the code I wrote, so the left-most nodes are evaluated first.

Please help me understand.


Solution

  • Your approach would work if you would have done it this way:

        DFS(root.left, leaves);
        DFS(root.right, leaves);
        return leaves;
    

    Here we ignore the array returned by DFS. Realise that when the first recursive call returns, leaves has some values from the left subtree. The second recursive call extends this array with more values. The right thing to do is to return leaves after that second population has happened.

    Be aware that DFS returns a reference to the values array, so in essence each call returns the same reference. If you do values.concatenate(values) it is clear you duplicate values, but that is exactly what you are doing with:

    DFS(root.left, leaves).concat(DFS(root.right, leaves));
    

    It just is a bit more obscure... because values is getting populated in both recursive calls. But that is quite irrelevant to explain the problem. The key understanding is that both recursive calls return a reference to the same array.

    To avoid such problems, it is good practice to never have a function return the reference to an object that was also passed as argument, and which it mutates. It is better to choose between these two patterns:

    • Have the function take the object (array in this case) as argument, but return undefined. This way the caller is made more aware that their argument has been mutated.

    • Have the function not take the object as argument, but have it construct one from scratch, which it then can return to the caller. In this pattern each call of the function returns a distinct object reference.

    Mixing these two patterns can easily lead to the problems you have encountered. In this actual algorithm, I have a clear preference for the second pattern, and then there shouldn't be a values argument. So:

    function DFS(root) {
        if (!root) return []; 
        if (!root.left && !root.right) return [root.val];
        return DFS(root.left).concat(DFS(root.right));
    }
    

    The downside is that a lot of little arrays are created here, and concatenating all of them can be costly.

    More elegant is to make use of a generator function. You get the best of both worlds: you avoid creating little arrays from scratch each time, and you avoid passing an array as argument that mutates.

    function leafSimilar(root1, root2) {
        const leavesRoot1 = [...DFS(root1)]; // consume the iterator
        const leavesRoot2 = [...DFS(root2)];
    
        for (let i = 0; i < Math.max(leavesRoot1.length, leavesRoot2.length); i += 1) {
            if (leavesRoot1[i] !== leavesRoot2[i]) {
                return false;
            }
        }
        return true;
    };
    
    function* DFS(root) {
        if (!root) return; 
        if (!root.left && !root.right) yield root.val;
        yield* DFS(root.left);
        yield* DFS(root.right);
    }