Search code examples
c++boostmultimapvariant

How do I move items from a boost::variant to a multimap?


I'd like to improve the performance of PickPotatoes in the below code by using move instead of copy, but I can't figure out how to do that with insert and a boost::variant. In my actual use case, parsing the data takes about 75% of the time, and the real version of PickPotatoes takes about 25%, due to some slow copies. By improving PickPotatoes I should be able to get that down. Is it possible to move something out of a boost::variant and improve PickPotatoes?

#include <map>
#include "boost/variant.hpp"
#include <string>
#include <vector>
#include <functional>
struct tuber
{
    int z;
    std::vector<double> r;
};

int getZ(const tuber& t)
{
    return t.z;
}

boost::variant<std::string, tuber> GrowPotato()
{
    int z = std::rand() / (RAND_MAX / 10);
    if (z < 2)
    {
        return "BAD POTATO";
    }
    else
    {
        tuber ret;
        ret.z = z;
        ret.r.resize(10000);
        for (int i = 0;i < 10000;++i)
        {
            ret.r[i] = std::rand() / (RAND_MAX / 50);
        }
        return ret;
    }
}



std::vector<boost::variant<std::string,tuber>> GrowPotatoes(int n)
{

    std::vector<boost::variant<std::string, tuber>> ret;
    ret.resize(n);
    for (int i = 0; i < n; ++i)
    {
        ret[i] = GrowPotato();
    }

    return ret;
}

//could make this more efficient.
std::pair<std::vector<std::string>,std::multimap<int, tuber>> PickPotatoes(std::vector <boost::variant<std::string, tuber>> result)
{
    std::pair<std::vector<std::string>,std::multimap<int,tuber>> ret;
    int numTypTwo = 0;
    for (const auto& item : result)
    {
        numTypTwo += item.which();
    }
    ret.first.resize(result.size() - numTypTwo);
    int fstSpot = 0;
    for (int i = 0; i < result.size();++i)
    {
        if (result[i].which())
        {
            ret.second.insert(std::pair<int, tuber>(getZ(boost::get<tuber>(result[i])), boost::get<tuber>(result[i])));
        }
        else
        {
            ret.first[fstSpot++] = std::move(boost::get<std::string>(result[i]));
        }
    }
    return ret;
}
int main()
{
    std::srand(0);
    std::vector<boost::variant<std::string, tuber>>  q= GrowPotatoes(5000);
    std::pair<std::vector<std::string>, std::multimap<int, tuber>> z = PickPotatoes(q);
    return 0;
}

Solution

  • The simplest win would be to move the parameter value:

    std::pair<std::vector<std::string>, std::multimap<int, tuber>> z = PickPotatoes(std::move(q));
    

    Indeed, it wins 14% of performance, roughly on my benchmarks. The rest heavily depends on what it all means, how it's going to be used.

    Focus on reducing allocations (use a non-nodebased container if you can, e.g. boost::flat_multimap, sort explicitly, use string_view, parse into the desired datastructure instead of intermediate).

    BONUS

    I was able to shave off about 30% using:

    std::pair<std::vector<std::string>, std::multimap<int, tuber> >
    PickPotatoes(std::vector<boost::variant<std::string, tuber> >&& result) {
    
        std::pair<std::vector<std::string>, std::multimap<int, tuber> > ret;
    
        ret.first.reserve(result.size());
    
        struct Vis {
            using result_type = void;
    
            void operator()(std::string& s) const {
                first.emplace_back(std::move(s));
            }
            void operator()(tuber& tbr) const {
                second.emplace(tbr.z, std::move(tbr));
            }
    
            std::vector<std::string>& first;
            std::multimap<int, tuber>& second;
        } visitor { ret.first, ret.second };
    
        for (auto& element : result) {
            boost::apply_visitor(visitor, element);
        }
    
        return ret;
    }
    

    Using emplace, avoiding repeated get<>, avoiding the loop to get the first size etc.