Search code examples
c++stlstdvectorstdset

How do I Insert std::vector into a std::set which has a custom sort function


Note that the problem here has been solved and it had nothing to do with the insert but rather an uninitialized struct member variable! Hopefully this question and its answer might help another rookie avoid such a mistake.

I want to insert a std::vector of file names into a std::set that has a customer sorter struct for ordering files by date instead of alphabetically.

With the default alphabetical ordering I can simply insert a vector into my set using:

std::set<std::string> mySet;
std::vector<std::string> myVector;

myVector.push_back("Banana.txt");
myVector.push_back("Apple.txt");
myVector.push_back("Cat.txt");

mySet.insert(myVector.begin(), myVector.end());

And this would give me exactly what I would expect: A std::set of file names that would be ordered alphabetically.

Now if I have a custom sorter that sorts by date instead of file name, like so:

struct DateOrderSorter
{
    bool operator()(const std::string& file1, const std::string& file2)
    {
        struct stat buf_stat1;
        struct stat buf_stat2;
        std::string fullpath1 = path + file1;
        std::string fullpath2 = path + file2;
        stat(&fullpath1[0], &buf_stat1);
        stat(&fullpath2[0], &buf_stat2);
        return buf_stat1.st_ctime < buf_stat2.st_ctime;
    }
    std::string path;
};

And I declare my set as:

std::set<std::string, DateOrderSorter>

and then declare an instance of DateOrderSorter:

DateOrderSorter dateOrderSorter;
dateOrderSorter.path = "C:/random_path_that_has_been_verified_to_work"

When I do the same insertion:

mySet.insert(myVector.begin(), myVector.end());

It returns just the first and the last files sorted by date. So just myVector.begin() and myVector.end().

  1. First of all, why this behavior?
  2. What can I do to get the vector inserted into my set ordered through my customer sorter.

I've tried

std::vector<std::string>::iterator vector_it = myVector.begin();
std::vector<std::string>::iterator vector_end = myVector.end();
for (; vector_it!= vector_end; ++vector_it) {
    mySet.insert(*vector_it);
}

But this didn't copy the vector fully and the order was very weird. It didn't follow name or date order..


Solution

  • The path member of the comparison function is default-initialized to "", hence stat errors (the error goes unnoticed because the return value is not checked in the code), and buf_stat1.st_ctime < buf_stat2.st_ctime is comparing uninitialized memory.

    Which means that the comparison function can return true for both a < b and a > b : this violates a rule of strict weak ordering, hence you observe this weird behaviour.

    To fix this, you can make path a static member, and verify that stat doesn't error, like so :

    struct DateOrderSorter
    {
        bool operator()(const std::string& file1, const std::string& file2)
        {
            struct stat buf_stat1;
            struct stat buf_stat2;
            std::string fullpath1 = path + file1;
            std::string fullpath2 = path + file2;
            int r = stat(&fullpath1[0], &buf_stat1);
            assert(r==0);
            r = stat(&fullpath2[0], &buf_stat2);
            assert(r==0);
            return buf_stat1.st_ctime < buf_stat2.st_ctime;
        }
        static std::string path;
    };
    

    and initialize it like so:

    DateOrderSorter::path = "...";