Search code examples
c++vectorfindlarge-data

Find larger string vector in large string vector


In C++, what's the quickest way (or decent way) to check each element in a string vector of size approx. 800,000 to see if it's in another string vector of approx. size 200,000? My goal is to push all the strings of the first that are found in the second into a third.

My beginner attempt is never going to stop running:

vector<string> combosVsWords(vector<string> words, vector<string> lettercombos)
{
    vector<string> firstwords;

    for (int i = 0; i != lettercombos.size(); i++)
    {
        if (find(words.begin(), words.end(), lettercombos[i]) !=   words.end())
            firstwords.push_back(lettercombos[i]);
    }       
}

Solution

  • If the vectors can be sorted, then the following should work using std::set_intersection:

    #include <algorithm>
    #include <vector>
    #include <string>
    #include <algorithm>
    #include <iterator>
    //...
    using namespace std;
    
    vector<string> combosVsWords(vector<string>& words, 
                                 vector<string>& lettercombos)
    {
        vector<string> firstwords;
    
        // Sort the vectors 
        sort(words.begin(), words.end());
        sort(lettercombos.begin(), lettercombos.end());
    
        // get the set intersection of the vectors and place
        // the result in firstwords
        set_intersection(words.begin(), words.end(), lettercombos.begin(), 
                         lettercombos.end(), back_inserter(firstwords));
    
        return firstwords;
    }