arrays algorithm data-structures binary-search-tree binary-search

Find the index of the farthest smaller number in the right side of an array

Given an array of size N. For every element in the array, the task is to find the index of the farthest element in the array to the right which is smaller than the current element. If no such number exists then print -1

This question is taken from here

Sample Test Cases

Input 
3, 1, 5, 2, 4
Output
3, -1, 4, -1, -1

Input
1, 2, 3, 4, 0
Output 
4, 4, 4, 4, -1

I would also like to clarify that this is not a duplicate of this post here. While I did understand the solution mentioned in the post, I would really like to know why the above approach does not work for all test cases.

I came up with the following approach,

Create a binary search tree from the right side of the array
Each node stores the following info - the value, the index of the current element and the index of the smallest element which is farthest away from it's right side
While inserting, check if the current element being inserted (while moving to the right subtree) satisfies the condition and update the farthestDst accordingly

I tried to submit this, but I got Wrong Answer (failing test case not shown) despite running successfully against some sample test cases. I have attached my code in C++ below

class TreeNode{
public:
    // farthestDst is the index of the smallest element which is farthest away from it's right side
    int val,idx,farthestDst;
    TreeNode* left;
    TreeNode* right;
    TreeNode(int value, int index, int dst){
        val = value;
        idx = index;
        farthestDst = dst;
        left = right = NULL; 
    }
};

class Solution{   
  public:
    TreeNode* root = NULL;
    unordered_map <int,TreeNode*> mp; // store address of each node just to speed up search
    TreeNode* insertBST(TreeNode* root, int val, int idx, int dst){
        if(root == NULL){
            TreeNode* node = new TreeNode(val,idx,dst);
            mp[val] = node;
            return node;
        }
        else if(val >= root->val){ // checking the condition
            if((root->idx)-idx > dst){
                dst = root->idx;
            }
            root->right = insertBST(root->right,val,idx,dst);
        }
        else{
            root->left = insertBST(root->left,val,idx,dst);
        }
        return root;
    }
    // actual function to complete where N is the size of the vector and nums contains the values
    vector<int> farNumber(int N,vector<int> nums){
        vector<int> res;
        if(nums.size() == 0){ // base case check if nums is empty
            return res;
        }
        for(int i = nums.size()-1; i >= 0; i--){
            root = insertBST(root,nums[i],i,-1);
        }
        for(int i = 0; i < nums.size(); i++){
            TreeNode* node = mp[nums[i]]; 
            res.push_back(node->farthestDst);
        }
        return res;
    }
};

Just a note, if anyone wants to test their solution, they can do so at this link

Please do let me know if further clarification about the code is needed

Any help would be appreciated. Thanks!

Solution

mp[] assumes that each element value appears at most once in the input. This is not given as part of the problem description, so it's not guaranteed. If some value appears more than once, its original value in mp[] will be overwritten. (Ironically, most C++ standard libraries implement unordered_map<T> as a balanced BST -- an AVL tree or red-black tree.)

Not technically a bug, but as pointed out by nice_dev in a comment, because your BST performs no rebalancing, it can become arbitrarily badly balanced, leading to O(n) insertion times for O(n^2) performance overall. This will occur on, e.g, sorted or reverse-sorted inputs. There are probably test cases large enough to cause timeouts for O(n^2)-time algorithms.

Unfortunately, adding rebalancing to your code to bring the worst-case time down to O(n log n) will cause it to become incorrect, because it currently depends on a delicate property: It doesn't compare each inserted element with all smaller-valued elements to its right, but only with the ones you encounter on the path down from the root of the BST. Whenever during this traversal you encounter an element at position j with value nums[j] < nums[i], you ignore all elements in its left subtree. With your current implementation, this is safe: Although these elements are all known to be smaller than nums[i] by the BST property, they can't be further to the right than j is, because insertion order means that every child is to the left of its parent. But if you change the algorithm to perform tree rotations to rebalance the tree, the second property can be lost -- you could miss some element at position k with nums[k] < nums[j] < nums[i] but k > j.

Finally, having both a member variable root and a function argument root is confusing.