Search code examples
algorithmsub-arrayarray-algorithms

Find the length of the longest repeated subArray


Given an array of integers between 1 and 10^5 find in the best time and space the lenght of the longest repeated subArray. I was thinking in making binary search but i want to hear some suggestions. Thanks for helping !


Solution

  • Indeed binary search can be used but it will require a hashing of the array to make overall algorithm efficient, you can read more about rolling hash, rolling hash is like you create a hash for a subarray, so if you want to check two subarray for equality then you can just check their rolling hash in O(1) time, if they are equal they very very high probability is that they are equal depending upon your hash function, so you will binary search on the len of the required repeated subarray, i.e suppose length range is from 0 to (n/2) where n is the size of the array, 0 means no repeated subarray is there, so suppose we have our mid as potential answer, make a check function in which now make a hashmap where integer is key and value as vector which stores hash of all the starting positions of subarray of length mid

    unordered_map<int , vector<int>>pos;
    

    now traverse the array and store all the hashes as key and their starting pos in vector as they come,so if two hashes are being repeated, it will go in same vector, now once done, we have got max n distinct hashes, so traverse the hashes in map, and if size of vector is more that 1, then check the difference between pos of first and last element of the vector for that corresponding hash, if difference is >= len(or mid) then yeah you have got a subarray of length mid and store it in our answer, which is repeating, now here the magic of binary search comes, we can easily prove that if this subarray/substring is repeating then any subarray/substring of it is also repeating, so on the basis of this pattern, we try to go for more higher len which may be potential answer, i.e we update l = mid + 1, and suppose now the mid we get doesn't isn't valid len, so it is sure that no subarray having length greater than or equal to this can exist which is repeating, so we go for slightly lower range, i.e r = mid - 1, and do it till we are done with our binary search which will have maximum log(n/2) iteration and each check function will have n iteration in each iteration of binary search so total complexity for this algorithm (assuming you are using hashing and getting substring/subarray hash which can be retrieved in O(1) which is actually possible through some preprocessing of the original array first and making a new array having hashed value through which we can get subarray hash) is n * log(n/2) => O(n*log(n)) below is rough code in c++ for understanding

    #include<iostream>
    #include<unordered_map>
    #include<vector>
    using namespace std;
    bool check(vector<int> & a , int len){
    
        int n = a.size();
        unordered_map<int , vector<int>> pos;
    
        for(int i = 0; i < n - len + 1; ++i){
            int hash_value = subarray_hash(a , i , i + len - 1); // some function to get subarray hash, which I have not implementated for OP exercise
            pos[hash_value].push_back(i);
        }
        for(auto it = pos.begin(); it != pos.end(); ++it){
            vector<int> all_pos = *it;
            if(all_pos.size() > 1){
                int k = all_pos.size();
                if(all_pos[k - 1] - all_pos[0] >= len){
                    return true;
                }
            }
        }
        return false;
    }
    int main(){
    
        int n;
        cin >> n;
        vector<int>a(n);
        for(int i = 0; i < n; ++i){
            cin >> a[i];
        }
        int maxlen_possible = 0;
        int l = 0 , r = (n/2);
        while(l <= r){
            int mid = (l + (r - l)/2);
            if(check(a , mid)){
                maxlen_possible = mid;
                l = mid + 1;
            }
            else{
                r = mid - 1;
            }
        }
        cout << maxlen_possible << "\n";
    
    return 0;
    }
    

    now for calculating subarray/substring hash, you can refer to rolling hash on Internet, let me know if something is not clear.