data-structures tree time-complexity recursive-datastructures

Computing the nodes of a left skewed tree faster

I have to traverse a tree, in which I need to count the total number of nodes. So, in an easier way, I can traverse the tree and do some count++ to count the nodes of the tree. But this is very time-consuming for a tree having millions of nodes. It will take O(N) time where N is the number of nodes. I want to reduce the time complexity of this approach. How can I do that? For your reference, I am sharing the pseudo-code of my idea.

struct Node{
Node* left;
Node* right;
}
int traverse(Node* node){
  if (node == null) return 0; //base case
  count++;
  count += traverse (node->left) //recursive call
  count += traverse (node->right) //recursive call
  return count;
}

Also please let me know if the above approach is going to be work or not? If not then why? And please suggest any faster approach.

Solution

There is no other way then to count the nodes, i.e. you have to visit them. But you have an undeclared variable count in your recursive function. Instead just start from scratch in each call, as you will only return the count of nodes below that node (and 1 to account for the node itself), irrespective of the other surroundings of that node:

int traverse(Node* node){
  if (node == null) return 0; //base case
  return 1 + traverse(node->left) + traverse(node->right);
}

But if you are going to count nodes repeatedly, at different states of the tree, then you may benefit from storing the count of nodes in each subtree, and keep that information updated with every mutation of the tree. So you would store in each node instance, the number of nodes in the subtree of which it is the root (including itself).

struct Node{
    Node* left;
    Node* right;
    int count;
}

With every insert operation, you increment the count member of all the ancestors of that node, and give the node itself a count of 1. This represents O(logn) time complexity which does not increase the time complexity you probably already have for the insert operation.

With every delete operation, you decrement the count member of all the ancestors of that deleted node. This represents O(logn) time complexity which does not increase the time complexity you probably already have for the delete operation.

When you need to get the count of the whole tree, just read out the root node's count value. This is then just a O(1) operation.