Search code examples
c++recursionoptimizationboggle

Optimize this recursive function [Boggle resolver]


I realized a recursive function to solve this problem:

  • With a 4x4 Matrix composed of letters (A...Z), for every letter get all the n-lenght words by following only adiacent and diagonal cells.

My function's idea is to do a recursive call for every letter to every possible direction and then concatenate the string that it is going to be formed. When the lenght of this string will be n, it stops. To prevent the returning back to a letter already used, I created and array of pointers to letter's class which compares every new possible letter with all the letters used. For doing this I created a special class for every letter.

This is the function: (sorry, it's very long)

dirSu means Up - dirGiu Down - dirDx right - dirSx left - and all the Others the diagonal positions

    void decisione(lettera *start, lettera *next, int count, string seq, vector <lettera*> visitate, int quanti) {

        if (next == nullptr) return ;

        if (count == quanti) {
            bool test = false;

            for (int k = 0; k < start->sequenze.size(); k++) { //To prevent same letter to be used
                if (start->sequenze[k] == seq)  {
                    test = true;
                    break;
                }
            }
            if (!test) start->sequenze.push_back(seq);
            return ;
        }

        seq = seq + next->chiave; //String concatenating every call
        count++; //Counter to decide when stop
        visitate.push_back(next); //Array of used letters

        int i;
        bool test;

        //Decide direction
        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSu, count, seq, visitate, quanti);


        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSu, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirGiu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirGiu, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirDx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirDx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirAdx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirAdx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirAsx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirAsx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirBdx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirBdx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirBsx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirBsx, count, seq, visitate, quanti);

    }

Well, when I activate the function for words of 3 letters, it is really fast, 4 letters: 0.028s of calculation time per letter in the matrix, 5 letters: 0.225s, 6 letters: 1.891s, 7 letters: 14.914s.

Is there a way to reduce calculation time by optimizing my function? (or removing my stupid logical errors?) Thanks a lot!

The Matrix I used for my tests is:

C A S A
C O S A
B A C I
O T A T

I used this for every speed test (without calculating it by frequency or randomly) and with this (with my recursive function improved since my question) I get in about 90 seconds all the words with 5, 6, 7 letters for every starting letter. There are a lot of words, although it is only a 4x4 Matrix! If I can send it to you privately it could be interesting to see also your solution to the same problem. (I don't know how to contact you here)


Solution

  • Change your algorithm from recursive to iterative (with a loop). It's usually faster. After that, the other optimizations are related to the algorithm and the data structures, in order to reduce the complexity. It's much more difficult to find such optimizations (but is possible).

    EDIT: some thoughts on the code, followed by a version of the word-search I've done today:

    • you want only english words (or italian ones, etc...). To check this you should use a dictionary, it is the most efficient structure for look-up of words, both in terms of memory and performances
    • a comment said that this algorithm is strongly recursive. I disagree, I've made an iterative one, with much less copy of vectors, and with word check. Passing an algorithm from recursive to iterative usually reduce the number of copies. But I still agree with one point of this comment: some algorithm may be faster in recursive form (but hey're not so common). Also, this algorithm was not simple to find ;)
    • You should split your code. It will be way more readable, and easier to optimize.
    • You're coding a boggle resolver, so you should be aware that your dictionary must be precise, and should contain all forms of verbs (for example, for 'be': am, are, is, was, were, etc...). I tested it in french because the english word list I have is only 850 words, which is way too short.
    • The conditions on letter with accents in the dictionary are not important, they're here just to avoid using more slots for accents. Also, I striped every composed word (with caret in it) from the text file.
    • On my computer, loading the dictionary (non-exhaustive 350000 french words, 3 Mb) and searching all 5,6 and 7-length words take approximatively 0.6 seconds (i7 2660K, 8Gb RAM, release with /0x, measured by me, not by the computer ;) ). The perf measurement will be relevant only on your machine, for comparison.

    So, here is the code. It's a standalone, you just have to compile it, execute with the english (or italian or french) words file (text, one word per line):

    #include <string>
    #include <iostream>
    #include <stdlib.h>
    #include <vector>
    #include <time.h>
    #include <map>
    #include <algorithm>
    
    // **************************************************** //
    //                                                      //
    // Data structure for the words : dictionnary           //
    //                                                      //
    // **************************************************** //
    struct DicNode
    {
        const char l;
        bool isEndOfWord;
        DicNode* children[26]; // a - z, case insensitive
        DicNode* parent;
        unsigned int min, max, level;
    
        DicNode(char c, DicNode* p = 0, bool end = false) : l(c),isEndOfWord(end),parent(p),min((unsigned int)(-1)),max(0), level(parent ? parent->level + 1 : 0)
        {
            for(unsigned int t = 0; t < 26; t++)
                children[t] = 0;
        }
    
        ~DicNode()
        {
            for(unsigned int t = 0; t < 26; t++)
                if(children[t])
                    delete children[t];
        }
    
        DicNode*& at(char l)
        {
            if(l == 'à' || l == 'ä' || l == 'â')
                l = 'a';
            if(l == 'é' || l == 'è' || l == 'ê' || l == 'ë')
                l = 'e';
            if(l == 'ï' || l == 'î')
                l = 'i';
            if(l == 'ö' || l == 'ô')
                l = 'o';
            if(l == 'ü' || l == 'û' || l == 'ù')
                l = 'u';
            if(l == 'ç')
                l = 'c';
            if(l <= 'Z')
                l = 'a' + l - 'A';
    
            // Note: undefined behavior if l > 'Z' or l < 'A'.
            return children[(unsigned int)(l - 'a')];
        }
    
        bool inRange(unsigned int n) const
        {
            return n <= max && n >= min;
        }
    
        void print(std::string str = "") const
        {
            // Here recursive algorithm is way better because of the object-oriented structure
            if(isEndOfWord)
                std::cout << (str + l) << "\n";
    
            for(unsigned int t = 0; t < 26; t++)
                if(children[t])
                    children[t]->print(str + l);
        }
    
        std::string toString() const
        {
            std::string str(level, '0');
            const DicNode* node = this;
            while(node && node->level > 0)
            {
                str[node->level - 1] = node->l;
                node = node->parent;
            }
            return str;
        }
    };
    
    void addWord(DicNode* root, const std::string& str)
    {
        if(!root || str == "") return;
        DicNode* node = root;
        unsigned int i = 0;
        while(i != str.size())
        {
            if(str.size() - i > node->max) node->max = str.size() - i;
            if(str.size() - i < node->min) node->min = str.size() - i;
    
            char c = tolower(str[i]);
            DicNode*& n = node->at(c);
            if(!n) n = new DicNode(c,node);
            node = n;
            i++;
        }
        node->isEndOfWord = true;
    }
    
    // **************************************************** //
    //                                                      //
    // Data structures for the algorithm                    //
    //                                                      //
    // **************************************************** //
    
    enum Direction
    {
        INVALID,
        NORTH,
        NORTH_EAST,
        EAST,
        SOUTH_EAST,
        SOUTH,
        SOUTH_WEST,
        WEST,
        NORTH_WEST,
        FINISHED
    };
    
    Direction incDir(Direction d)
    {
        switch(d)
        {
            case NORTH:         return NORTH_EAST;
            case NORTH_EAST:    return EAST;
            case EAST:          return SOUTH_EAST;
            case SOUTH_EAST:    return SOUTH;
            case SOUTH:         return SOUTH_WEST;
            case SOUTH_WEST:    return WEST;
            case WEST:          return NORTH_WEST;
            case NORTH_WEST:    return NORTH;
            default:            return INVALID;
        }
    }
    
    Direction operator-(Direction d)
    {
        switch(d)
        {
            case NORTH:         return SOUTH;
            case NORTH_EAST:    return SOUTH_WEST;
            case EAST:          return WEST;
            case SOUTH_EAST:    return NORTH_WEST;
            case SOUTH:         return NORTH;
            case SOUTH_WEST:    return NORTH_EAST;
            case WEST:          return EAST;
            case NORTH_WEST:    return SOUTH_EAST;
            default:            return INVALID;
        }
    }
    
    std::string toString(Direction d)
    {
        switch(d)
        {
            case NORTH:         return "NORTH";
            case NORTH_EAST:    return "NORTH_EAST";
            case EAST:          return "EAST";
            case SOUTH_EAST:    return "SOUTH_EAST";
            case SOUTH:         return "SOUTH";
            case SOUTH_WEST:    return "SOUTH_WEST";
            case WEST:          return "WEST";
            case NORTH_WEST:    return "NORTH_WEST";
            default:            return "INVALID";
        }
    }
    
    struct Cell
    {
        char l;
        DicNode* node;
        Direction dir, dirParent;
        Cell(char l_ = 'A') : l(l_),node(0),dir(INVALID),dirParent(INVALID) {}
    };
    
    struct ProbLetter
    {
        char letter;
        float proba;
    };
    
    class Map
    {
        public:
            Map(unsigned int w, unsigned int h) : width(w), height(h)
            {
                data = new Cell[width * height];
                for(unsigned int t = 0; t < width * height; t++)
                    data[t] = randomEnglishLetter();
            }
    
            static char randomEnglishLetter()
            {
                // Frequency of english letters
                static ProbLetter probas[26] =
                {
                    { 'Z', 0.074f },
                    { 'Q', 0.095f },
                    { 'X', 0.150f },
                    { 'J', 0.153f },
                    { 'K', 0.772f },
                    { 'V', 0.978f },
                    { 'B', 1.492f },
                    { 'P', 1.929f },
                    { 'Y', 1.974f },
                    { 'G', 2.015f },
                    { 'F', 2.228f },
                    { 'W', 2.360f },
                    { 'M', 2.406f },
                    { 'U', 2.758f },
                    { 'C', 2.782f },
                    { 'L', 4.025f },
                    { 'D', 4.253f },
                    { 'R', 5.987f },
                    { 'H', 6.094f },
                    { 'S', 6.327f },
                    { 'N', 6.749f },
                    { 'I', 6.966f },
                    { 'O', 7.507f },
                    { 'A', 8.167f },
                    { 'T', 9.056f },
                    { 'E', 12.702f }
                };
    
                // Random number between 0 and 1
                float p = 100.0f * float(rand() % 10000) / 9999.0f;
                float sum = 0.0f;
                for(unsigned int t = 0; t < 26; t++)
                {
                    sum += probas[t].proba;
                    if(p < sum) return probas[t].letter;
                }
                return probas[25].letter;
            }
    
            Cell& operator()(unsigned int x, unsigned int y)
            {
                return data[x + y * width];
            }
    
            bool inRange(int x, int y)
            {
                return x >= 0 && x < (int)width && y >= 0 && y < (int)height;
            }
    
            void print()
            {
                for(unsigned int y = 0; y < height; y++)
                {
                    for(unsigned int x = 0; x < width; x++)
                        std::cout << "  " << data[x + y * width].l;
                    std::cout << "\n";
                }
            }
    
            unsigned int getWidth() const { return width; }
            unsigned int getHeight() const { return height; }
    
        private:
            unsigned int width, height;
            Cell* data;
    };
    
    
    // **************************************************** //
    //                                                      //
    // Word-search algorithm                                //
    //                                                      //
    // **************************************************** //
    
    inline void advance(int& x, int& y, Direction d)
    {
        switch(d)
        {
            case NORTH:
                y--;
                return;
            case NORTH_EAST:
                x++;
                y--;
                return;
            case EAST:
                x++;
                return;
            case SOUTH_EAST:
                x++;
                y++;
                return;
            case SOUTH:
                y++;
                return;
            case SOUTH_WEST:
                x--;
                y++;
                return;
            case WEST:
                x--;
                return;
            case NORTH_WEST:
                x--;
                y--;
                return;
            default:
                return;
        }
    }
    
    struct Data
    {
        Map& map;
        int x;
        int y;
    };
    
    bool descend(Data& dat, unsigned int n)
    {
        DicNode* parent = dat.map(dat.x, dat.y).node;
        if(n == parent->level) return false;
    
        Direction dir = dat.map(dat.x, dat.y).dir;
        if(dir == FINISHED) return false;
        else if(dir == INVALID) dir = NORTH;
    
        int pX = dat.x, pY = dat.y;
        bool firstIteration = true;
        for(; firstIteration || dir != NORTH; dir = incDir(dir))
        {
            firstIteration = false;
            pX = dat.x;
            pY = dat.y;
            advance(pX, pY, dir);
    
            if(dat.map.inRange(pX, pY) // (pX, pY) is not outside the map
                && dat.map(pX, pY).node == 0 // The cell is not already used
                && parent->at(dat.map(pX, pY).l) != 0) // An entry in the dictionary exists
            {
                // We found the next node
                dat.map(dat.x, dat.y).dir = dir;
                dat.map(pX, pY).node = parent->at(dat.map(pX, pY).l);
                dat.map(pX, pY).dirParent = -dir;
                dat.x = pX;
                dat.y = pY;
    
                return true;
            }
        }
    
        return false;
    }
    
    bool ascend(Data& dat)
    {
        // Go back on the previous node
    
        Direction dp = dat.map(dat.x, dat.y).dirParent;
        if(dp == INVALID) return false;
    
        dat.map(dat.x, dat.y).dir = INVALID;
        dat.map(dat.x, dat.y).dirParent = INVALID;
        dat.map(dat.x, dat.y).node = 0;
        advance(dat.x, dat.y, dp);
    
        dat.map(dat.x, dat.y).dir = incDir(dat.map(dat.x, dat.y).dir);
        if(dat.map(dat.x, dat.y).dir == NORTH)
            dat.map(dat.x, dat.y).dir = FINISHED;
    
        return true;
    }
    
    void findWordsFromPosition(Map& map, DicNode* parent, unsigned int x, unsigned int y, unsigned int n, std::vector<std::string>& output)
    {
        if(!parent) return;
    
        bool ok = true;
    
        // Setup first node
        map(x, y).node = parent;
        map(x, y).dir = NORTH;
    
        // while we can find next node with direction
        // If marked node has right order and is end of word, or if condition on n is not verified:
        //     add it to the output (if condition on n is verified)
        //     no need to go further (because of n), so advance direction of parent cell, reset current cell, and go up for the current position
        Data dat = { map, x, y };
        while(ok)
        {
            if(map(dat.x,dat.y).node->toString() == "ane")
                std::cout << "ok\n";
            if(!descend(dat, n))
                ok = ascend(dat);
            else
            {
                DicNode* node = dat.map(dat.x, dat.y).node;
                if(node->level == n && node->isEndOfWord)
                {
                    std::string str = node->toString();
                    if(std::find(output.begin(), output.end(), str) == output.end())
                        output.push_back(str);
                    ok = ascend(dat);
                }
                else if(!node->inRange(n - node->level))
                    ok = ascend(dat);
            }
        }
    
        // Clean first node
        map(x, y).dir = INVALID;
        map(x, y).dirParent = INVALID;
        map(x, y).node = 0;
    }
    
    void getWords(Map& map, DicNode& dic, unsigned int n, std::vector<std::string>& output)
    {
        if(n > dic.max || n < dic.min) return;
    
        // Search words for every position
        for(unsigned int y = 0; y < map.getHeight(); y++)
            for(unsigned int x = 0; x < map.getWidth(); x++)
                findWordsFromPosition(map,dic.at(map(x,y).l),x,y,n,output);
    }
    
    #include <fstream>
    
    int main()
    {
        srand((unsigned int)time(0));
        // Prepare the words, for example, load them from a real dictionary
        DicNode dictionary(0);
        unsigned int maxSize = 0;
    
        // Note: the following code make sense only if you load words from a file
        {
            std::ifstream dico("english.txt");
            std::string str;
            int count = 0;
            while(dico >> str)
            {
                if(str.size() > maxSize) maxSize = str.size();
                addWord(&dictionary,str);
                count++;
            }
            std::cout << "Read " << count << " words from dictionary, longest with " << maxSize << " letters.\n";
        }
    
        // Prepare the matrix
        Map mat(4,4);
        /*
        mat(0,0) = 'A';
        mat(1,0) = 'N';
        mat(2,0) = 'F';
        mat(3,0) = 'N';
        mat(0,1) = 'S';
        mat(1,1) = 'L';
        mat(2,1) = 'E';
        mat(3,1) = 'R';
        mat(0,2) = 'G';
        mat(1,2) = 'O';
        mat(2,2) = 'R';
        mat(3,2) = 'R';
        mat(0,3) = 'S';
        mat(1,3) = 'I';
        mat(2,3) = 'U';
        mat(3,3) = 'I';
        */
        std::cout << "\nMatrix:\n";
        mat.print();
    
        // Search words
        std::vector<std::string> words;
        for(unsigned int t = 5; t <= 7; t++)
            getWords(mat,dictionary,t,words);
        std::cout << "\nWords:\n";
        if(words.size() == 0)
            std::cout << " (no words)\n";
        else
            for(unsigned int t = 0; t < words.size(); t++)
                std::cout << " " << words[t] << "\n";
    }
    

    Note: the previous code was a code-search resolver. It now removed from the answer, but you can still get it by searching in the edit revisions of this answer.

    EDIT2: Some verbal explanations

    The dictionary. It's a kind of a radix-tree, but each of the node have only one letter to store. It can have up to 26 children (more if you include accents and/or digits), one for each letter of the considered alphabet. The basic layout is the following:

    Word tree http://img571.imageshack.us/img571/2281/wtree.png

    Example:

    An example of a dictionary http://img706.imageshack.us/img706/1244/wordsr.png

    Each node also contains a boolean which indicates whether it's the end of a word. Additionnally, each node store the minimum and maximum size of its children branchs, its parent, and its level from the root (= size of the prefix). It allows to stop the search of words when we see that no children branch can give a word with a specific length.

    The matrix. The matrix is a double array of letters. But, for more efficiency, I added some data: its corresponding node in the dictionary (see algorithm), the direction to its children (see algorithm) and to its parent (see algorithm).

    The algorithm. The idea is to 'circle' in the map, by 'descending' (increasing the path representing the current found string). When descending is not possible, we have to 'ascend':

    FIND WORD:
    ----------
    
    while(we can continue)
        try descend
        if(failed to descend)
            ascend
        else
            node = node of current cell
            if node is end of word
                add it to the found words, then ascend
            else if node is out of range (ie, n = 5, but with node you can only have words with 3 characters)
                ascend
    
    DESCEND:
    --------
    
    we're from cell (x,y), with node D
    for each direction (from cell.direction to north-west, clockwise)
        advance position (x,y) with direction
        if the position is valid
            and we haven't already visited the cell for this word
            and the node got with D.children[cell.letter] exists
            set the current position of the algorithm to the result of the advance
            set the direction of the old cell to the direction used to advance
            set the direction of the new cell to the opposite (pointing to the old cell)
            set the node of the new cell by the found one
            return
    
    ASCEND:
    -------
    
    we're at (x,y)
    cell.node = 0 (the cell is not considered as visited anymore)
    advance the parent direction by one clockwise (if this direction is north-west, we use a special direction that say that we cannot rotate anymore: FINISHED)
    cell.dir = INVALID
    advance (x,y) following cell.direction_to_parent
    cell.direction_to_parent = INVALID
    set the current position to the result of advance
    

    An image explanation:

    Algorithm unrolled http://img822.imageshack.us/img822/1374/algow.png

    Steps (number indicated in the losanges):

    1. Here is our matrix. I tested it with a french dictionary. Let's say we start from the (0,0) cell (the algorithm works wherever you start).
    2. The first valid direction (word + in map) is east. We advance following it. The node is not the end of a word, and is still valid.
    3. The first valid direction (word + in map) is south-east. We advance following it. The node is not the end of a word, and is still valid.
    4. The first valid direction (word + in map) is east. We advance following it. The node is not the end of a word, and is still valid.
    5. There are no valid directions (either out-of-map, cell already visited (the E), or not in dictionary), so we ascend
    6. The first valid direction (word + in map) is south-east. We advance following it. The node is not the end of a word, and is still valid.
    7. The first valid direction (word + in map) is south. We advance following it. The node is not the end of a word, and is still valid.
    8. The first valid direction (word + in map) is west. We advance following it. The node is the end of a word (which is "anerie"), so we add it to the list of found words, and we ascend. For information, "ânerie" in french is "nonsense" or "stupidity" in english.
    9. And so on...