Search code examples
c++performanceunordered-set

Filling unordered_set is too slow


We have a given 3D-mesh and we are trying to eliminate identical vertexes. For this we are using a self defined struct containing the coordinates of a vertex and the corresponding normal.

    struct vertice
    {
        float p1,p2,p3,n1,n2,n3;

        bool operator == (const vertice& vert) const
        {
            return (p1 == vert.p1 && p2 == vert.p2 && p3 == vert.p3);
        }
    };

After filling the vertex with data, it is added to an unordered_set to remove the duplicates.

    struct hashVertice
    {
        size_t operator () (const vertice& vert) const
        {
            return(7*vert.p1 + 13*vert.p2 + 11*vert.p3);
        }
    };

    std::unordered_set<vertice,hashVertice> verticesSet;

    vertice vert;

    while(i<(scene->mMeshes[0]->mNumVertices)){

            vert.p1 = (float)scene->mMeshes[0]->mVertices[i].x;
            vert.p2 = (float)scene->mMeshes[0]->mVertices[i].y;
            vert.p3 = (float)scene->mMeshes[0]->mVertices[i].z;

            vert.n1 = (float)scene->mMeshes[0]->mNormals[i].x;
            vert.n2 = (float)scene->mMeshes[0]->mNormals[i].y;
            vert.n3 = (float)scene->mMeshes[0]->mNormals[i].z;

            verticesSet.insert(vert);

            i = i+1;
    }

We discovered that it is too slow for data amounts like 3.000.000 vertexes. Even after 15 minutes of running the program wasn't finished. Is there a bottleneck we don't see or is another data structure better for such a task?


Solution

  • What happens if you just remove verticesSet.insert(vert); from the loop?

    If it speeds-up dramatically (as I expect it would), your bottleneck is in the guts of the std::unordered_set, which is a hash-table, and the main potential performance problem with hash tables is when there are excessive hash collisions.

    In your current implementation, if p1, p2 and p3 are small, the number of distinct hash codes will be small (since you "collapse" float to integer) and there will be lots of collisions.

    If the above assumptions turn out to be true, I'd try to implement the hash function differently (e.g. multiply with much larger coefficients).


    Other than that, profile your code, as others have already suggested.