Search code examples
c#treedepth-first-search

Sales Path Tree Problem DFS implementation


Can somebody explain to me what the error in my code is? I'm trying to solve this problem as practice, and I keep getting a stack overflow error. I've been looking over it and I just can't seem to find where I'm overflowing the stack in my code. Some things to note:

  • I don't actually understand in my code how the children nodes I've created in my test case "know" who their parent node is--because of this, I added those lines assigning their parents to the root node just in case. I understand this probably doesn't look very nice but I just wanted to figure out my main issue!
  • I originally initialized the values of parent and children to be null when creating a new SalesNode object, however I was coming across NullReferenceExceptions, so I also changed this part as a safety net.

Problem:

The car manufacturer Honda holds their distribution system in the form of a tree (not necessarily binary). The root is the company itself, and every node in the tree represents a car distributor that receives cars from the parent node and ships them to its children nodes. The leaf nodes are car dealerships that sell cars direct to consumers. In addition, every node holds an integer that is the cost of shipping a car to it.

Take for example the tree below:

              0
          /   |   \ 
         5    3      6
        /    /  \    / \
       4    2    0   1   5
           /     /
           1    10 
           \
            1 

A path from Honda’s factory to a car dealership, which is a path from the root to a leaf in the > tree, is called a Sales Path. The cost of a Sales Path is the sum of the costs for every node in the path. For example, in the tree above one Sales Path is 0→3→0→10, and its cost is 13 (0+3+0+10).

Honda wishes to find the minimal Sales Path cost in its distribution tree. Given a node rootNode, write a function getCheapestCost that calculates the minimal Sales Path cost in the tree.

Implement your function in the most efficient manner and analyze its time and space complexities.

For example:

Given the rootNode of the tree in diagram above

Your function would return:

7 since it’s the minimal Sales Path cost (there are actually two Sales Paths in the tree whose cost is 7: 0→6→1 and 0→3→2→1→1)

My code:

using System;
using System.Collections;

class SalesNode
{
    public int cost;
    public SalesNode[] children;
    public SalesNode parent;

    public SalesNode()
    {
        children = new SalesNode[0];
        parent = new SalesNode();
    }
}

class Solution
{
    static void Main(string[] args)
    {
        SalesNode rootNode = new SalesNode();
        rootNode.children = new SalesNode[3];
        rootNode.cost = 0;
        rootNode.children[0] = new SalesNode {cost = 5};
        rootNode.children[1] = new SalesNode {cost = 3};
        rootNode.children[2] = new SalesNode {cost = 6};
        rootNode.children[0].parent = rootNode;
        rootNode.children[1].parent = rootNode;
        rootNode.children[2].parent = rootNode;
        Console.WriteLine(getCheapestCost(rootNode));
    }
    
    public static int getCheapestCost(SalesNode rootNode) {
        int minSalesPathCost = Int32.MaxValue;
        int currentPathCost = 0;
        // Create a stack to implement DFS
        Stack salesNodeStack = new Stack();
        SalesNode currentNode;
        
        salesNodeStack.Push(rootNode);
        
        while (salesNodeStack.Count > 0) { // Stack is not empty
            currentNode = (SalesNode) salesNodeStack.Pop();
            currentPathCost += currentNode.cost;
            
            if (currentNode.children.Length == 0) { // Current node is a leaf
                minSalesPathCost = Math.Min(minSalesPathCost, currentPathCost);
                do { // Keep subtracting costs until back to parent node of next unexplored path
                    currentPathCost -= currentNode.cost;
                    currentNode = currentNode.parent;
                } while (currentNode != rootNode && currentNode.children.Length == 1);
                
            } else { // Current node has children
                foreach (SalesNode child in currentNode.children) {
                    salesNodeStack.Push(child);
                }
            }
        }
        
        return minSalesPathCost;
    }
}

I greatly appreciate any insight. Thanks guys!


Solution

  • The reason for the stack overflow is the initialisation of the parent field in the SalesNode constructor. You create a new instance of SalesNode by calling its constructor with new SalesNode(). The constructor in turn creates a new instance of the SalesNode in order to initialise the parent field, thereby calling itself. The original call to the constructor never completes, because it's stuck in a loop, trying to create an infinitely long chain of SalesNode instances, each of which has another SalesNode instance as its parent property, and eventually the stack is completely full of references to where control should return to if/when one of those calls to the constructor ever completes. Curiously, when I tried your code (using .net core 3.1 in Visual Studio 2019), the stack overflow exception didn't have a stack trace, so I can see why that would be difficult to debug.

    The first question you should ask yourself is whether a SalesNode really needs to know what its parent node is. If it does, then don't instantiate the parent node in the child node's constructor, instead change the constructor's signature so that you can pass it a parent SalesNode which you've already instantiated. That solves the stack overflow exception.

    public SalesNode(SalesNode parent)
    {
        this.children = new SalesNode[0];
        this.parent = parent;
    }
    

    However, I'm not convinced that a SalesNode does need to know what its parent node is. It looks like the intent might be for the salesNodeStack to keep track of where the getCheapestCost method is in the tree and therefore which node to go to if it has reached a leaf node, checked whether it's the cheapest, and then needs to navigate up to the parent in order to evaluate the cost of another path. But that's not how the method is actually implemented.

    The test data you're using is a tree like this

        0
        |
    +---+---+
    |   |   |
    5   3   6
    

    Having commented out SalesNode's parent field and its initialisation, stepping through the code in the debugger, I see that on entering the while (salesNodeStack.Count > 0) loop, currentNode is set to the root node. OK so far.

    This node has 3 children, so control goes to the else block, which pushes each its 3 children onto the salesNodeStack and then passes control back to the start of the while (salesNodeStack.Count > 0) loop. This time, currentNode is set to the last child node which was pushed onto the salesNodeStack, the one with cost 6, and because it doesn't have any children, minSalesPathCost is correctly set to 6 + 0, as this is the first leaf node to be evaluated, and therefore it's the cheapest seen so far.

    The inner loop's implementation doesn't seem to match the intent described by the comment.

    do
    { // Keep subtracting costs until back to parent node of next unexplored path
        currentPathCost -= currentNode.cost;
        //currentNode = currentNode.parent;
        currentNode = (SalesNode)salesNodeStack.Pop();
    } while (currentNode != rootNode && currentNode.children.Length == 1);
    

    I've commented out the reference to currentNode.parent because I've removed that field from the SalesNode class to solve the stack overflow exception, and instead tried to pop the parent node off the salesNodeStack, but that doesn't actually work because the next item on the stack isn't the node's parent, it's the node's previous sibling. And because the previous sibling doesn't have exactly 1 children (it has none), this loop exits. So we're back to the start of the while (salesNodeStack.Count > 0) loop again.

    Now the first child of the root node is popped off the stack (we've completely skipped evaluating the node with cost 3) and we evaluate its cost, which comes to 5, which is correct if you ignore the fact that we've ignored the node with cost 3.

    But then currentNode = (SalesNode)salesNodeStack.Pop() throws an InvalidOperationException because the salesNodeStack is now empty, so there's nothing to pop.

    The two aspects of your solution that I think you need to concentrate on (other than the fact that the SalesNode constructor always tries to instantiate another SalesNode) are...

    foreach (SalesNode child in currentNode.children)
    {
        salesNodeStack.Push(child);
    }
    

    Rather than pushing all the (immediate) children of one node onto the stack, maybe you should be navigating to a child node, pushing that onto the stack, checking whether that child node has children and if so navigating to one of them and pushing it onto the stack, and repeat until you've reached a leaf node?

    } while (currentNode != rootNode && currentNode.children.Length == 1);
    

    The exit condition for this inner loop doesn't actually check whether we're back to a node which is a parent of a path that hasn't been visited yet (which is what currentPathCost -= currentNode.cost implies that it should do). Exactly what this condition should be will depend on how you implement the navigation all the way down to a leaf node before evaluating the cost of the path. You probably need to add a property/field to the SalesNode class to indicate whether or not it's already been visited.

    A few other things you may want to consider...

    Your test data is a tree which only has one level of child nodes. I don't think this is enough to properly test your code. The tree posted in your question would be a much better test case - try to implement this in your test data.

                  0
                  |
            +-----+-------+
            |     |       | 
            5     3       6
            |     |       |
            +   +-+-+     +-+-+
            |   |   |     |   |
            4   2   0     1   5
                |   |
                +   +
                |   |
                1  10
                |
                + 
                |
                1 
    

    You could change the type of salesNodeStack from System.Collections.Stack to System.Collections.Generic.Stack<SalesNode>. It doesn't make any functional difference, but it's type safe; it guarantees that everything in the stack is an instance of SalesNode and saves you having to cast each one to SalesNode after popping it off the stack.

    You could also change the type of SalesNode's children field from an array to a System.Collections.Generic.List<SalesNode>. Like the Stack<T> class, it's type safe, and unlike an array it can grow and shrink as you add items to it or remove items from it.