Search code examples
c++visual-c++openmpknn

parallel K nearest neighbor using openmp and segmentation fault


I'm trying to do k nearest neighbor (KNN) of the data points in "dat", so my first step is to construct a distance matrix between each point and all the other points, then for each point find the K nearest neighbor. The following code works perfect in serial without openmp. However, when I use openmp it gives a segmentation fault. I think this error has to do with how I update smallest which contains the index of the k smallest elements. I thought may be I need to use "reduction" with the vector smallest, but I'm not sure how to use it or is it right or wrong, so any help on how to overcome this segmentation fault is really appreciated.

vector<vector<double> > dist(dat.size(), vector<double>(dat.size()));
size_t p,j;
ptrdiff_t i;
vector<double> sumKnn;
vector<vector<int > > smallest(dat.size(), vector<int>(k));
#pragma omp parallel for private(p,j,i) default(shared)
for(p=0;p<dat.size();++p)
{
    int mycont=0;
    for (j = p+1; j < dat.size(); ++j)
    {
        double ecl = 0.0;
        for (i = 0; i < c; ++i)
        {
            ecl += (dat[p][i] - dat[j][i]) * (dat[p][i] - dat[j][i]);
        }
        ecl = sqrt(ecl);
        dist[p][j] = ecl;
        dist[j][p] = ecl;
        int index=0; 
        if(mycont<k && j!=p)
        {
            smallest[p][j-p-1]=j;
            mycont++;
        }
        else
        {
            double max=0.0;
            int index=0;
            for(int i=0;i<smallest[p].size();i++)
            {
                if(max < dist[p][smallest[p][i]])
                {
                    index=i;
                    max=dist[p][smallest[p][i]];
                } 
            }
            if(max>dist[p][j])
            {
                smallest[p].erase(smallest[p].begin()+index);
                smallest[p].push_back(j);
            }
        }        
    }
double sum=0.0;
for(int r=0;r<k;r++)
    sum+= dist[p][smallest[p][r]];
sumKnn.push_back(sum);
}

Solution

  • You can just use "critical" directive:

    #pragma omp critical
    {
    smallest[p].erase(smallest[p].begin()+index);
    smallest[p].push_back(j); 
    }
    

    and

    #pragma omp critical
    sumKnn.push_back(sum);
    

    But I agree, that better is to use kd-tree or k-means tree istead of parallelization. You can just download FLANN library http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN.