Search code examples
c++stlstdset

Efficient way to get subset of std::set


I call a library function that accepts pointer to std::set and processes elements of it.

However, it processes only certain number of elements (let's say 100) and if the set has more elements it simply throws an exception. However, I receive set of much larger size. So I need efficient way to get subset of std::set.

Currently, I am copying 100 elements to temporary set and pass it to the function.

struct MyClass
{
    // Class having considerably large size instance
};

// Library function that processes set having only 100 elements at a time
void ProcessSet (std::set<MyClass>* ptrMyClassObjectsSet);

void FunctionToProcessLargeSet (std::set<MyClass>& MyClassObjSet)
{
    std::set<MyClass> MyClass100ObjSet;

    // Cannot pass MyClassObject as it is to ProcessSet as it might have large number of elements
    // So create set of 100 elements and pass it to the function
    std::set<MyClass>::iterator it;
    for (it = MyClassObjSet.begin(); it != MyClassObjSet.end(); ++it)
    {
        MyClass100ObjSet.insert (*it);

        if (MyClass100ObjSet.size() == 100)
        {
            ProcessSet (&MyClass100ObjSet);
            MyClass100ObjSet.clear();
        }
    }

    // Prrocess remaining elments
    ProcessSet (&MyClass100ObjSet);
    MyClass100ObjSet.clear();
}

But it's impacting performance. Can anyone please suggest better ways to do this?


Solution

  • Since it looks like you are locked into having to use a subset. I have tweaked your code a little and I think it might be faster for you. It is still an O(n) operation but there is no branching in the for loop which should increase performance.

    void FunctionToProcessLargeSet(std::set<MyClass>& MyClassObjSet)
    {
        int iteration = MyClassOgjSet.size() / 100; // get number of times we have collection of 100
        auto it = MyClassObjSet.begin();
        auto end = MyClassObjSet.begin();
        for (; iteration == 0; --iteration)
        {
            std::advance(end, 100); // move end 100 away
            std::set<MyClass> MyClass100ObjSet(it, std::advance(it, end));  // construct with iterator range
            std::advance(it, 100); // advace it to end pos
            ProcessSet(&MyClass100ObjSet); // process subset
        }
        if (MyClassOgjSet.size() % 100 != 0)  // get last subset
        {
            std::set<MyClass> MyClass100ObjSet(it, MyClassObjSet.end());
            // Prrocess remaining elments
            ProcessSet(&MyClass100ObjSet);
        }
    }
    

    Let me know if this runs faster for you.