I am trying to implement a boost::multi_index application and performance is very bad: inserting 10,000 objects takes almost 0.1 second and it is not acceptable. So when I look into the documentation and found that boost::multi_index could accept a memory allocator as the last parameter but I got a lot of compiling errors when I tried to implement myself. Please help me correct. Thanks.
struct order
{
unsigned int id;
unsigned int quantity;
double price;
};
struct id{};
struct price{};
typedef multi_index_container<
order,
indexed_by<
hashed_unique<
tag<id>, BOOST_MULTI_INDEX_MEMBER(order, unsigned int, id)>,
ordered_non_unique<
tag<price>,BOOST_MULTI_INDEX_MEMBER(order ,double, price),
std::less<double> >
>,
boost::object_pool<order>
> order_sell;
In general, compiler does not like the expression of boost::object_pool as an allocator in order_sell declaration.
Let me re-state Alexander's suggestion that you profile your program in order to understand where the problem really lies. I strongly doubt Boost.MultiIndex per se can be as slow as you say. The following program measures the time taken to create an order_sell
container (without Boost.Pool), populate it with 10,000 random orders and destroy it:
#include <algorithm>
#include <array>
#include <chrono>
#include <numeric>
std::chrono::high_resolution_clock::time_point measure_start,measure_pause;
template<typename F>
double measure(F f)
{
using namespace std::chrono;
static const int num_trials=10;
static const milliseconds min_time_per_trial(200);
std::array<double,num_trials> trials;
volatile decltype(f()) res; /* to avoid optimizing f() away */
for(int i=0;i<num_trials;++i){
int runs=0;
high_resolution_clock::time_point t2;
measure_start=high_resolution_clock::now();
do{
res=f();
++runs;
t2=high_resolution_clock::now();
}while(t2-measure_start<min_time_per_trial);
trials[i]=duration_cast<duration<double>>(t2-measure_start).count()/runs;
}
(void)res; /* var not used warn */
std::sort(trials.begin(),trials.end());
return std::accumulate(
trials.begin()+2,trials.end()-2,0.0)/(trials.size()-4);
}
void pause_timing()
{
measure_pause=std::chrono::high_resolution_clock::now();
}
void resume_timing()
{
measure_start+=std::chrono::high_resolution_clock::now()-measure_pause;
}
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/member.hpp>
using namespace boost::multi_index;
struct order
{
unsigned int id;
unsigned int quantity;
double price;
};
struct id{};
struct price{};
typedef multi_index_container<
order,
indexed_by<
hashed_unique<
tag<id>,BOOST_MULTI_INDEX_MEMBER(order, unsigned int, id)>,
ordered_non_unique<
tag<price>,BOOST_MULTI_INDEX_MEMBER(order ,double, price),
std::less<double> >
>
> order_sell;
#include <iostream>
#include <random>
int main()
{
std::cout<<"Insertion of 10,000 random orders plus container cleanup\n";
std::cout<<measure([](){
order_sell os;
std::mt19937 gen{34862};
std::uniform_int_distribution<unsigned int> uidist;
std::uniform_real_distribution<double> dbdist;
for(unsigned int n=0;n<10000;++n){
os.insert(order{uidist(gen),0,dbdist(gen)});
}
return os.size();
})<<" seg.\n";
}
When run in -O3
mode with whatever backend Coliru uses, we get:
Insertion of 10,000 random orders plus container cleanup 0.00494657 seg.
VS 2015 release mode in my machine (Intel Core i5-2520M @2.50GHz) yields:
Insertion of 10,000 random orders plus container cleanup 0.00492825 seg.
So, this is around 20 times faster than you report, and I'm including container destruction and random number generation in my measurements.
A couple of additional observations:
boost::object_pool
does not provide the allocator interface the standard library specifies for interoperability with containers. You might want to use boost::pool_allocator
instead (I've played with it a bit and doesn't seem to improve speed, though, but your mileage may vary).order_sell
container with a hashed index and an ordered index, each value goes into one node of its own, plus you have a separate so-called bucket array (an array of pointers) with a length roughly the same as the number of elements. You can't get better than that for a node-based data structure (if you want to dispense with iterator stability you could devise more memory-efficient schemes).