Search code examples
c++performancerandom-walk

How to verify if a vector has a value at a certain index


In a "self-avoiding random walk" situation, I have a 2-dimensional vector with a configuration of step-coordinates. I want to be able to check if a certain site has been occupied, but the problem is that the axis can be zero, so checking if the fabs() of the coordinate is true (or that it has a value), won't work. Therefore, I've considered looping through the steps and checking if my coordinate equals another coordinate on all axis, and if it does, stepping back and trying again (a so-called depth-first approach).

Is there a more efficient way to do this? I've seen someone use a boolean array with all possible coordinates, like so:

bool occupied[nMax][nMax];          // true if lattice site is occupied
for (int y = -rMax; y <= rMax; y++)
        for (int x = -rMax; x <= rMax; x++)
            occupied[index(y)][index(x)] = false;

But, in my program the number of dimensions is unknown, so would an approach such as:

typedef std::vector<std::vector<long int>> WalkVec;
WalkVec walk(1, std::vector<long int>(dof,0));
siteVisited = false; counter = 0;  
            while (counter < (walkVec.back().size()-1))
            {
                tdof = 1;
                while (tdof <= dimensions)
                {

                    if (walkHist.back().at(tdof-1) == walkHist.at(counter).at(tdof-1) || walkHist.back().at(tdof-1) == 0)
                    {
                        siteVisited = true;
                    }
                    else
                    {
                        siteVisited = false;
                        break;
                    }
                    tdof++;
                }

work where dof if the number of dimensions. (the check for zero checks if the position is the origin. Three zero coordinates, or three visited coordinates on the same step is the only way to make it true) Is there a more efficient way of doing it?


Solution

  • You can do this check in O(log n) or O(1) time using STL's set or unordered_set respectively. The unordered_set container requires you to write a custom hash function for your coordinates, while the set container only needs you to provide a comparison function. The set implementation is particularly easy, and logarithmic time should be fast enough:

    #include <iostream>
    #include <set>
    #include <vector>
    #include <cassert>
    
    class Position {
    public:
        Position(const std::vector<long int> &c)
            : m_coords(c) { }
    
        size_t dim() const { return m_coords.size(); }
        bool operator <(const Position &b) const {
            assert(b.dim() == dim());
            for (size_t i = 0; i < dim(); ++i) {
                if (m_coords[i] < b.m_coords[i])
                    return true;
                if (m_coords[i] > b.m_coords[i])
                    return false;
            }
            return false;
        }
    
    private:
        std::vector<long int> m_coords;
    };
    
    int main(int argc, const char *argv[])
    {
        std::set<Position> visited;
        std::vector<long int> coords(3, 0);
        visited.insert(Position(coords));
    
        while (true) {
            std::cout << "x, y, z: ";
            std::cin >> coords[0] >> coords[1] >> coords[2];
            Position candidate(coords);
            if (visited.find(candidate) != visited.end())
                std::cout << "Aready visited!" << std::endl;
            else
                visited.insert(candidate);
        }
        return 0;
    }
    

    Of course, as iavr mentions, any of these approaches will require O(n) storage.

    Edit: The basic idea here is very simple. The goal is to store all the visited locations in a way that allows you to quickly check if a particular location has been visited. Your solution had to scan through all the visited locations to do this check, which makes it O(n), where n is the number of visited locations. To do this faster, you need a way to rule out most of the visited locations so you don't have to compare against them at all.

    You can understand my set-based solution by thinking of a binary search on a sorted array. First you come up with a way to compare (sort) the D-dimensional locations. That's what the Position class' < operator is doing. As iavr pointed out in the comments, this is basically just a lexicographic comparison. Then, when all the visited locations are sorted in this order, you can run a binary search to check if the candidate point has been visited: you recursively check if the candidate would be found in the upper or lower half of the list, eliminating half of the remaining list from comparison at each step. This halving of the search domain at each step gives you logarithmic complexity, O(log n).

    The STL set container is just a nice data structure that keeps your elements in sorted order as you insert and remove them, ensuring insertion, removal, and queries are all fast. In case you're curious, the STL implementation I use uses a red-black tree to implement this data structure, but from your perspective this is irrelevant; all that matters is that, once you give it a way to compare elements (the < operator), inserting elements into the collection (set::insert) and asking if an element is in the collection (set::find) are O(log n). I check against the origin by just adding it to the visited set--no reason to treat it specially.

    The unordered_set is a hash table, an asymptotically more efficient data structure (O(1)), but a harder one to use because you must write a good hash function. Also, for your application, going from O(n) to O(log n) should be plenty good enough.