Given a std::vector<std::vector<int>>
:
std::vector<std::vector<int>>
std::vector<int>
std::vector<int>
My question is twofold:
I tried the following:
std::vector<std::vector<int>> responseVectors
{
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 },
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 }
};
std::vector<std::vector<int>> uniqueResponseVectors;
uniqueResponseVectors.reserve(responseVectors.size());
std::sort(responseVectors.begin(), responseVectors.end());
std::unique_copy(responseVectors.begin(), responseVectors.end(), std::back_inserter(uniqueResponseVectors));
std::vector<int> freqTotal;
freqTotal.reserve(uniqueResponseVectors.size());
for(auto&& vector : uniqueResponseVectors)
{
int count = std::count(responseVectors.begin(), responseVectors.end(), vector);
freqTotal.push_back(count);
}
And the output should be:
for(auto&& vector : uniqueResponseVectors)
{
for(auto&& value : vector)
{
std::cout << "\t" << value << "\t";
}
std::cout << "\n";
}
// Printed result for the `uniqueResponseVectors`:
//
// 0 1 2
// 1 2 3
// 2 1 2
// 2 3 2
// 3 2 3
// 3 3 3
// Similarly for the `freqTotal`:
//
// 2
// 6
// 2
// 2
// 4
// 2
Additional context:
when I tried the code above against a larger dataset (i.e., responseVectors
with a size of 100000 and 12 as the size of each std::vector<int>
), it was slow.
I also tried to manipulate responseVectors
directly by calling std::unique
and erase()
on it, in order avoid creating a copy. Then I used an iterator loop to std::count
how many times a unique std::vector<int>
occurs. However, this proved even slower.
user463035818 pointed me to an excellent answer by YSC
and PaulMcKenzie further iterated on the importance of picking the proper data structure(s)
Based on this input, I tried to particularize the solution provided by YSC to my case, in order to better understand what is going on. It boils down to using std::map
which is a sorted associative container:
std::map<std::vector<int>, int> SortAndCountOccurences(std::vector<std::vector<int>>& vectors)
{
std::map<std::vector<int>, int> result;
std::for_each(vectors.begin(), vectors.end(), [&result](auto const& vector) {
++result[vector];
});
return result;
}
With the following usage:
std::vector<std::vector<int>> responseVectors
{
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 },
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 }
};
std::map<std::vector<int>, int> sortedVectors = SortAndCountOccurences(responseVectors);
Which will output:
for(auto&& vector : sortedVectors)
{
for(auto&& value : vector.first)
{
std::cout << "\t" << value << "\t";
}
std::cout << "-> x" << vector.second << " times" << "\n";
}
// 0 1 2 -> x2 times
// 1 2 3 -> x6 times
// 2 1 2 -> x2 times
// 2 3 2 -> x2 times
// 3 2 3 -> x4 times
// 3 3 3 -> x2 times
Note: The solution by YSC can be generalized to anything iterable.