Search code examples
c++set

Efficient & safe intersection of more than two sets


Here is a C++ program which uses std::set_intersection twice to compute the intersection of 3 sets (then prints the result). It produces the expected result 3 but:

  1. Is it safe to pass in 'newset' as both a source & destination set in the second call to set_intersection? As I understand it, with begin() and end() I'm passing references to these sets, so might I end up writing over my input by accident?

  2. Is there a more efficient approach here? Should I iterate over my sets in ascending size order? Is there any advantage in rolling my own multi-set intersection as opposed to multiple calls to std::set_intersection?

#include <algorithm>
#include <iostream>
#include <set>

int main()
{
    std::set<int> set_1 = {1,2,3}, set_2 = {2,3}, set_3 = {3}, newset;
    
    std::set_intersection(set_1.begin(), set_1.end(),
                  set_2.begin(), set_2.end(),
                  std::inserter(newset, newset.begin()));

    std::set_intersection(newset.begin(), newset.end(),
                  set_3.begin(), set_3.end(),
                  std::inserter(newset, newset.begin()));

    for(std::set<int>::iterator it = newset.begin(); it != newset.end(); it++){
        std::cout << *it;
    }
    std::cout << std::endl;
    
    return 0;
}

Solution

  • As you can read on cppreference,

    [...] The resulting range cannot overlap with either of the input ranges.

    so you're in undefined behavior land.

    As a proof by verification of this, I can tell you that I've copied your code, compiled it, run it, and for me it prints 23, so your correct result is just a coincidence.

    Therefore, it looks like to have to rely on another temporary.

    The STL doesn't seem to contain a solution for intersecting more than two sets, and you can't even use std::set_intersection in a nested fashion (e.g. result = my_set_intersection(set_1, my_set_intersection(set_2,set_3)), the reason being pretty simple: the algorithm's interface is "tainted" by iterators, i.e. it takes begin and end iterators to the sets, rather than the sets themselves as inputs; and it also returns an iterator.

    Porbably Boost has something useful, but I haven't found it yet.