c++performance queue breadth-first-search

Breadth-first search takes excessive time depending on veto insertion position

The problem being solved is LeetCode #752 "Open the Lock". A summary:

You have a combo lock with four one-digit wheels, starting at 0, 0, 0, 0. You want to determine the minimum number of movements required to change the lock to show a, b, c, d. In each movement, you can spin one wheel forward or backwards, e.g. the first wheel from 0 to 1 or 0 to 9.

There are "deadend" combinations. If you reach a deadend in your sequence of movements, the lock becomes stuck. So you must reach the target combination without encountering a deadend.

For example:

 deadends = ["0201","0101","0102","1212","2002"], target = "0202"

A sequence of valid moves would be "0000" -> "1000" -> "1100" -> "1200" -> "1201" -> "1202" -> "0202". Note that a sequence like "0000" -> "0001" -> "0002" -> "0102" -> "0202" would be invalid, because the wheels of the lock become stuck after the display becomes the dead end "0102". (copied from problem description)

I devised a breadth-first search using a queue and two sets. One set containing the deadends, and one set containing the already considered combinations.

If I insert a combination into the "considered" set after popping from the queue, my solution takes too much time to execute. If I insert it immediately after the combination is generated, then my solution executes much faster.

Why? I think they are effectively the same! For combinations "close" to 0, 0, 0, 0, the "slow" solution still manages to converge. For combinations far away, it does not converge within the time limit.

class Solution {
public:
    int openLock(vector<string>& deadends, string target) {
        // "minimum total number of turns" suggests BFS (queue)
        
        // time complexity is O(C^N) where C is constant (number of choices for each digit, e.g. ten, or 24 if a...z) and N is the length of the combo
        // other terms are N^2 and + N_deadends
        // string operations are N^2: for each of the N digits, we need to construct a string of length N (itself an O(N) operation) -> O(N*N*2) (twice for +1 and -1)
        int nMoves = 0;
        
        std::queue<std::string> toConsider;
        toConsider.push("0000");
        std::set<std::string> considered;
        
        std::set<std::string> de;
        for (auto d : deadends) de.insert(d); // converting the deadends vector to a set improves speed significantly

        while (!toConsider.empty()) {
            int const nStatesInLevel = toConsider.size();
            
            for (int i = 1; i <= nStatesInLevel; ++i) {
                std::string state = toConsider.front();
                toConsider.pop();
                // considered.insert(state); // IF WE PUT THIS HERE INSTEAD OF BELOW, THE SOLUTION IS TOO SLOW!
                if (de.find(state) != de.end()) continue; // veto if dead-end
                if (state == target) return nMoves;
                
                for (int i : {0, 1, 2, 3}) { // one out of four wheels to turn
                    int const oldWheelVal = state.at(i) - '0';
                    for (int c : {1, -1}) { // increase or decrease wheel value
                        int newWheelVal = oldWheelVal + c;
                        if (newWheelVal == -1) newWheelVal = 9;
                        if (newWheelVal == 10) newWheelVal = 0;
                        std::string newWheel = state;
                        newWheel.at(i) = newWheelVal + '0';
          
                        if (considered.find(newWheel) == considered.end()) {
                            toConsider.push(newWheel);
                            considered.insert(newWheel); // we need to put this here instead of after it is popped in order to avoid "time limit exceeded".
                            // I think both places should effectively be the same!
                        }
                    }
                }
            }
            
            nMoves++;
            
        }
        
        return -1;
    }
};

Solution

The slow version will allow duplicate states to be added to the queue. That they are duplicate is only detected when they are pulled from it, but then the "harm" has already been done. As there are many different paths to get to a same state, this will soon make the queue grow unnecessarily large, populated with lots of duplicates. This not only comes at a memory cost, but also at a time cost, as the additional memory allocation takes time.

First the initial state 0000 gets pulled from the queue, and marked as visited, and then the following will be appended to it:

1000, 9000, 0100, 0900, 0010, 0090, 0001, 0009

Then in the next iteration, 1000 will be pulled, and the following will be added to the queue. 0000 will be found, but not added to the queue (marked as ----):

2000, ----, 1100, 1900, 1010, 1090, 1001, 1009

Then 9000 will be pulled from the queue, and the following added:

----, 8000, 9100, 9900, 9010, 9090, 9001, 9009

Then 0100 will be pulled from the queue, and following appended. Again, 0000 will not be added, however, other states are found again no, but have not yet been pulled from the queue, and so they are not detected as duplicates (marked with asterisk):

1100*, 9100*, 0200, ----, 0110, 0190, 0101, 0109

...and so on. The further we go, the more duplicates get added.

The faster version will never push duplicate states to the queue.