Search code examples
c++stringsuffix-treesuffix-array

count number of occurence of each substring?


Given a string S, I want to calculate number of substrings which occurred n times (1 <= n <= s.length()). I have done it with rolling hash, it can be done by using a suffix tree. How can it be solved using a suffix array in complexity O( n^2 ) ?

like for s = "ababaab"

n no.of string

4 1 "a" ( substring "a" is present 4 times)

3 2 "b" , "ab" (substring "b" and "ab" are present 3 times)

2 2 "ba" , "aba"

1 14 "aa" , "bab" , "baa" , "aab" , "abab" ....


Solution

  • This is not a forum to get free code, but since I was in such good mod this evning, i wrote a short example for you. But i cannot guarantee that is error free, this was written in 15 minutes without special much thoughts.

    #include <iostream>
    #include <cstdlib>
    #include <map>
    
    class CountStrings
    {
        private:
                const std::string               text;
                std::map <std::string, int>     occurrences;
    
                void addString ( std::string );
                void findString ( std::string );
    
        public:
                CountStrings ( std::string );
                std::map <std::string, int> count ( );
    };
    
    void CountStrings::addString ( std::string text)
    {
        std::map <std::string, int>::iterator iter;
    
        iter = ( this -> occurrences ).end ( );
    
        ( this -> occurrences ).insert ( iter, std::pair <std::string, int> ( text, 1 ));
    }
    
    void CountStrings::findString ( std::string text )
    {
        std::map <std::string, int>::iterator iter;
    
        if (( iter = ( this -> occurrences ).find ( text )) != ( this -> occurrences ).end ( ))
        {
                iter -> second ++;
        }
        else
        {
                this -> addString ( text );
        }
    }
    
    CountStrings::CountStrings ( std::string _text ) : text ( _text ) { }
    
    std::map <std::string, int> CountStrings::count ( )
    {
        for ( size_t offset = 0x00; offset < (( this -> text ).length ( )); offset ++ )
        {
                for ( size_t length = 0x01; length < (( this -> text ).length ( ) - (  offset - 0x01 )); length ++ )
                {
                        std::string subtext;
    
                        subtext = ( this -> text ).substr ( offset, length );
    
                        this -> findString ( subtext );
                }
        }
    
        return ( this -> occurrences );
    }
    
    int main ( int argc, char **argv )
    {
        std::string text = "ababaab";
        CountStrings cs ( text );
        std::map <std::string, int> result = cs.count ( );
    
        for ( std::map <std::string, int>::iterator iter = result.begin ( ); iter != result.end ( ); ++ iter )
        {
                std::cout << iter -> second << " " << iter -> first << std::endl;
        }
    
        return EXIT_SUCCESS;
    

    }