Search code examples
c++vectorhashtable

Segmentation fault when assigning the return vector of a function to another vector


I'm having a problem with one of my homework assignments in which we need to detect duplicate strings in a vector of strings using a hash table. My code builds and compiles fine but I get a Segmentation fault when I try to assign the return vector from the duplicate detection algorithm to the duplicates vector. I've tried to figure out why this is happening but can't find a solution. I've attached my code below.

Function to find duplicates using hash table ##

 std::vector<std::string>find_duplicates_with_hashtable(std::vector<std::string> & strings) {

        std::vector<std::string> dups;
        typedef std::unordered_map<std::string, std::string> hashtable;
        hashtable table;

        for (std::vector<std::string>::iterator i = strings.begin(); i < strings.end(); i++) {
        std::unordered_map<std::string, std::string>::const_iterator it = table.find(*i);
        if (it != table.end() && (std::find(dups.begin(), dups.end(), *i)) == dups.end()) {
            dups = find_duplicates_with_sorting(dups); // line causing the problem
        }
        table.emplace(*i, *i);
        }

        return dups; 
  }

Function used to check if any elements in the given vector are present in the duplicates vector

std::vector<std::string> find_duplicates_with_sorting(std::vector<std::string> & strings) {
    std::vector<std::string> dups;

    std::sort(strings.begin(), strings.end());

    for( unsigned int i = 0; i < strings.size() - 1; ++i ) {
        if( strings[i].compare(strings[i+1]) == 0 ) {
            std::string found_dup = strings[i];
            if( dups.size() == 0 ) {
                dups.push_back(found_dup);
            }
            else
            {
                std::string last_found_dup = dups[ dups.size() - 1 ];
                if( last_found_dup.compare(found_dup) != 0 ) {              // Not a dup of a dup
                    dups.push_back(found_dup);
                }
            }
        }
    }

    return dups;
}

This is the context in which the hash table function is being called

TEST(BaseHash, SuperShortVector)
{
    std::vector<std::string> dups_found;
    auto & search_vector      = super_short_vector;
    auto & known_dups_vector  = super_short_vector_dups;

    dups_found = find_duplicates_with_hashtable(search_vector);

    std::sort(dups_found.begin(), dups_found.end());
    std::sort(known_dups_vector.begin(), known_dups_vector.end());


}

The line causing the problem is marked by a comment in the 'find_duplicates_with_hashtable' function

Also, since this is a homework assignment, I would greatly appreciate if someone could explain what I did wrong and just give me a general direction I could work towards in order to fix the problem since just copy-pasting code wouldn't help me learn

Sorry if the code is horrible. I'm having trouble understanding how to use hash tables.

Thank you :)


Solution

  • The segfault is happening here:

    for( unsigned int i = 0; i < strings.size() - 1; ++i ) {
            if( strings[i].compare(strings[i+1]) == 0 ) {
    

    The issue is that you are comparing an unsigned value, i, with the unsigned value returned from strings.size() - 1. When strings.size() is 0, this part i < strings.size() - 1 will be checking if i is less than the greatest integer value, which will (basically) always be true.

    This causes strings[i+1] to segfault when strings is length 0 or 1.

    This can be fixed in many ways, but for( int i = 0; i < (int)strings.size() - 1; ++i ) { would be a quick and dirty way to fix it.