I am trying to understand why range insertion below is faster than using the iterator.
vector<string> &paths // 3 milion strings
Method 1 : range insert
unordered_set<string> mySet;
mySet.insert(paths.begin(), paths.end());
Method 2 : iterator
vector<string>::iterator row;
for (row = paths.begin(); row != paths.end(); row++)
{
mySet.insert(row[0]);
}
Results :
Method 1 : 753 ms
Method 2 : 1221 ms
==============================
OS: Windows 10
IDE: visual studio code
Compiler: gcc version 8.1.0
Flags : -O3
Intuitively, the range insertion procedure should be faster. Imagine, for example, that you want to insert a million elements. If you do a range insert, the set can
There are some further possible optimizations that could be done here (using a pooled allocator for bulk allocations, doing a multithreaded insertion procedure, etc.), though I’m not sure whether these are actually done.
On the other hand, if you insert things one at a time, each of these steps needs to be done a million times. That means there’s time and space wasted allocating intermediate arrays of buckets that don’t ultimately get used, but which the implementation can’t tell won’t be used because the implementation has to keep things in a good state every step of the way.
For an unordered_set
these optimizations are just improvements to the expected O(1) cost per insertion. In some other containers like vector
or deque
, bulk inserts can be asymptotically faster than repeated individual inserts because the container can move other elements once during the bulk insert rather than doing lots of repeated shifts.
Hope this helps!