Search code examples
c++stlsetstl-algorithmset-intersection

std::set_intersection on two completely different containers


I had a simple requirement where I need to find the occurance of strings in one vector from a master list of strings in another vector. I was able to do it easily at first with:

vector<string> custom_list;
set<string> master_list;
vector<string> target_list;

std::sort(custom_list.begin(), custom_list.end());
std::set_intersection(custom_list.begin(), custom_list.end(), master_list.begin(),
                      master_list.end(), back_inserter(target_list));

This worked just fine. But later it turned out that each string in the master_list was associated with an identifier. I was hoping I could use std::set_intersection in such a way that I can use the intersected elements in target_list as an index to get at their identifiers. In effect I thought I'd change master_list to a map, like so:

map<string, SomeCustomId> master_list;

and be able to do something like:

auto I_want_this_id = master_list[target_list[0]);    

But now I am not sure if I can use set_intersection to compare two completely different containers (custom_list, a vector and master_list, a map) even if I write my own comparison function. Something like:

struct mycomparer {
    bool operator()(string const& lhs, pair<string, SomeCustomId> const& rhs) {
        return lhs == rhs.first;
    }
};

This doesn't quite work (I got a variety of compiler errors) and intuitively too, something about that felt wrong to me.

Is there a better way to accomplish what I am trying to do?


Solution

  • std::set_intersection expects a comparer that returns true if lhs < rhs, not if lhs == rhs. It also has to be able to compare its two arguments regardless of order (after all, determining if arguments are equivalent is done by (!comp(a, b) && !comp(b, a))).

    Thus, you'd want something like

    struct mycomparer {
        bool operator()(string const& lhs, pair<string const, SomeCustomId> const& rhs) {
            return lhs < rhs.first;
        }
        bool operator()(pair<string const, SomeCustomId> const& lhs, string const& rhs) {
            return lhs.first < rhs;
        }
    };
    

    Demo.

    Edit: Updated demo code to include all the necessary headers. (<iterator> and <string> were missing. They were probably included by other headers in GCC, but not in VC++.)

    VC++ 2012, when doing a debug build, appears to run some extra tests on the supplied predicate. This causes compilation to fail with errors like error C2664: 'bool mycomparer::operator ()(const std::pair<_Ty1,_Ty2> &,const std::string &)' : cannot convert parameter 1 from 'std::basic_string<_Elem,_Traits,_Alloc>' to 'const std::pair<_Ty1,_Ty2> &'. (It compiles fine for me on release build, once the headers are fixed and I switched to the old initialization style.)

    To fix this, supply overloads of operator () taking all four possible combinations of parameters:

    struct mycomparer {
        bool operator()(string const& lhs, pair<string const, SomeCustomId> const& rhs) {
            return lhs < rhs.first;
        }
        bool operator()(pair<string const, SomeCustomId> const& lhs, string const& rhs) {
            return lhs.first < rhs;
        }
        bool operator()(string const& lhs, string const& rhs) {
            return lhs < rhs;
        }
        bool operator()(pair<string const, SomeCustomId> const& lhs,
                        pair<string const, SomeCustomId> const& rhs) {
            return lhs.first < rhs.first;
        }
    };
    

    Edit 2: If you can use Boost.Range, it is much easier. Simply:

    boost::set_intersection(custom_list, 
                            master_list | boost::adaptors::map_keys,
                            back_inserter(target_list));
    

    No custom predicates required, and also very readable. Demo.