Search code examples
c++sortingstdvectorswap

How can I order a vector using another vector?


This question is similar to the one posed here however, this answer does not work for my question and it is slightly different.

What I am trying to do would best be shown in code:

//this would be a copy version:
int main(){
   std::vector<uint32_t> order = {0,2,5,6,9,10,1,3,4,7,8,11};
   std::vector<uint32_t> example = {0,1,2,3,4,5,6,7,8,9,10,11};
   std::vector<uint32_t> output(order.size());
   for(uint32_t i = 0; i < order.size(); ++i){
       output[i] = example[order[i]];
   }
}
//output comes out to {0,2,5,6,9,10,1,3,4,7,8,11}

However, when I try to implement an in-place version using the reorder code from the link above as such:

void reorder(std::vector<uint32_t> &v, std::vector<uint32_t> const &order )  {   
    for ( int s = 1, d; s < order.size(); ++ s ) {
        for ( d = order[s]; d < s; d = order[d] ) ;
        if ( d == s ) while ( d = order[d], d != s ) std::swap( v[s], v[d] );
    }
}

int main(){
    std::vector<uint32_t> order = {0,2,5,6,9,10,1,3,4,7,8,11};
    std::vector<uint32_t> example = {0,1,2,3,4,5,6,7,8,9,10,11};
    reorder(example, order);
}
//example = {0,6,1,7,8,2,3,9,10,4,5,11,}

How could I implement an in-place version of what I am trying to accomplish without copying memory?

Edit

I wanted to make something more clear, the vector example from the code can be arbitrary elements, I just happened to make them initialized the way I did for the ease of checking them. This would be entirely valid:

std::vector<uint32_t> example = {300,12,21,34,47,15,61,57,82,94,1,2};
  • order and example will always contain the same number of elements
  • It is not guaranteed that order and example store the same datatype
  • It is also guaranteed that order stores unique values
  • It is also guaranteed that order will always be the datatype uint32_t
  • order is always going to be in a range from 0 to n-1 where n is the size of example, Each of those numbers, exactly once, and nothing outside of that range. (like it does in the example code)
  • The actual order of that range is entirely random, has nothing to do with even or odd indices coming in any order.

Solution

  • I've come up with a simple algorithm to do this in O(N) time with O(1) additional storage. Unsurprisingly, it does destroy the contents of order in the process. I don't think you can get around that without either increasing time complexity or storage complexity.

    template <typename T>
    void reorder(std::vector<T> &v, std::vector<uint32_t> &order)
    {
        for (size_t i = 0; i < v.size(); i++)
        {
            size_t next = order[i];      // find index of next output
            while (next < i)             // resolve chain of moves
            {
                next = order[next];
            }
            std::swap(v[i], v[next]);    // exchange data
            order[i] = next;             // record where previous data moved to
        }
    }
    

    The basic approach is that you iterate through the ordering one at a time, and you consider every position prior to the current position as solved. You will never modify the ordering or the data for solved positions.

    Except for the trivial case where order is a sorted vector, you will inevitably need to move a piece of data out of the way, so that you can bring in the required value at that position. There are two possible scenarios:

    1. The value is already in the right place, or is in some "future" position; or
    2. The value points at a "past" position, so we know it has been moved one or more times such that it's now in a "future" position.

    So, on a very basic level, when you discover that for some i, the value of order[i] is less than i, then you know it was moved, and that it moved to the position order[order[i]]. From that position, it may have been moved again. You'll know, by applying exactly the same test, until you end up with an index that's greater than or equal to i. And that's where the data moved to.

    The secret to making this O(N) is that, after resolving the final "moved-to" position, you do one final step which is to overwrite order[i] with that new position. That means whatever searching you had to do at this position will never be repeated.

    Now, it's not at all obvious that this results in linear time complexity, so don't feel bad if it's a bit mind-bending. I found it hard to convince myself, and before writing this answer I actually had to prove it experimentally by looking for the worst-case total searches for all permutations of order. That was also useful to verify the algorithm's correctness.

    The reasoning really boils down to the fact that the more "hops" a single value makes, the fewer hops are possible for other values. By collapsing the result of any search you ensure that search is also O(1) in future. Crucially, that means the process of chaining one extra hop is also O(1) because successive "hops" for the same data value will always point back to a search that was flattened when the data was moved.


    Here's the little test harness that verifies all permutations and counts the total number of search indirections performed:

    #include <algorithm>
    #include <cstdint>
    #include <iostream>
    #include <numeric>
    #include <vector>
    
    typedef int Data;
    
    template <typename T>
    size_t reorder(std::vector<T> &v, std::vector<uint32_t> &order)
    {
        size_t indirections = 0;
        for (size_t i = 0; i < v.size(); i++)
        {
            size_t next = order[i];
            while (next < i)
            {
                // std::cout << "search " << i << " : hop to " << next << "\n";
                next = order[next];
                indirections++;
            }
            std::swap(v[i], v[next]);
            order[i] = next;
        }
        return indirections;
    }
    
    size_t count_worst_case_indirections(size_t size)
    {
        size_t max_indirections = 0;
    
        std::vector<uint32_t> order_perm(size);
        std::vector<uint32_t> order_worst;
        std::vector<uint32_t> order;
        std::vector<Data> data(size);
        std::vector<Data> expected;
        expected.reserve(size);
    
        // Test all possible orderings
        std::iota(order_perm.begin(), order_perm.end(), 0);
        do
        {
            // Reset initial data and generate expected result
            order = order_perm;
            std::iota(data.begin(), data.end(), 0);
            expected.clear();
            for (auto i : order) expected.push_back(data[i]);
    
            // Run test
            size_t indirections = reorder(data, order);
            if (indirections > max_indirections)
            {
                max_indirections = indirections;
                order_worst = order_perm;
            }
    
            // Throw if result is invalid
            if (data != expected) throw "ALGORITHM IS BROKEN";
        } while (std::next_permutation(order_perm.begin(), order_perm.end()));
    
        std::cerr << "worst order : ";
        for (auto x : order_worst) std::cerr << x << ' ';
        std::cerr << "\n";
    
        return max_indirections;
    }
    
    int main()
    {
        for (size_t size = 1; size < 12; size++)
        {
            size_t max_indirections = count_worst_case_indirections(size);
            std::cout << "Size " << size << " : " << max_indirections << "\n";
        }
    }
    

    stdout:

    Size 1 : 0
    Size 2 : 1
    Size 3 : 2
    Size 4 : 3
    Size 5 : 4
    Size 6 : 5
    Size 7 : 6
    Size 8 : 7
    Size 9 : 8
    Size 10 : 9
    Size 11 : 10
    

    stderr:

    worst order : 0 
    worst order : 1 0 
    worst order : 1 2 0 
    worst order : 1 2 3 0 
    worst order : 1 2 3 4 0 
    worst order : 1 2 3 4 5 0 
    worst order : 1 2 3 4 5 6 0 
    worst order : 1 2 3 4 5 6 7 0 
    worst order : 1 2 3 4 5 6 7 8 0 
    worst order : 1 2 3 4 5 6 7 8 9 0 
    worst order : 1 2 3 4 5 6 7 8 9 10 0
    

    This is clearly worst-case O(N) in searches. If you comment-out the line order[i] = next; in reorder, you will see that you get a worst-case of O(N2).

    If you then set up a single-fire experiment with a worst-case ordering (there are many worst-case orderings), and uncomment the line of output in the search loop, you'll see exactly why it's important to flatten the search.