Search code examples
opencvk-meansspherical-kmeansangle-to-euclidean-space

opencv: how to clusterize by angle using kmeans()


Question is, how to clusterize pairs of some units by their angle? Problem is that, kmeans operates on the notion of Euclidean space distance and does not know about periodic nature of angles. So to make it work, one needs to translate the angle to Euclidean space but hold the following true:

  1. close angles are close values in Euclidean space;
  2. far angles are far in Euclidean space.

Which means, that, 90 and -90 are distant values, 180 and -180 is the same, 170 and -170 are close (angles come from left up and to right: 0 - +180 and from left down to the right: 0 - -180)

I tried to use various sin() functions but they all have issues mentioned in points 1 and 2. Most perspective one is sin(x * 0.5f) but also having the problem that 180 and -180 are distant values in Euclidean space.


Solution

  • The solution I found is to translate angles to points on circle and feed them into kmeans. This way we make it to compare distances between points and this works perfectly.

    Important thing to mention. Kmeans @eps in termination criterion is expressed in terms of units of samples that you feed to kmeans. In our example maximal distant points have dist 200 units (2 * radius). This means that having 1.0f is totally fine. If you use cv::normalize(samples, samples, 0.0f, 1.0f) for your samples before calling kmeans(), adjust your @eps appropriately. Something like eps=0.01f plays better here.

    Enjoy! Hope this helps someone.

    static cv::Point2f angleToPointOnCircle(float angle, float radius, cv::Point2f origin /* center */)
    {
            float x = radius * cosf(angle * M_PI / 180.0f) + origin.x;
            float y = radius * sinf(angle * M_PI / 180.0f) + origin.y;
            return cv::Point2f(x, y);
    }
    
    static std::vector<std::pair<size_t, int> > biggestKmeansGroup(const std::vector<int> &labels, int count)
    {
            std::vector<std::pair<size_t, int> > indices;
            std::map<int, size_t> l2cm;
    
            for (int i = 0; i < labels.size(); ++i)
                    l2cm[labels[i]]++;
    
            std::vector<std::pair<size_t, int> > c2lm;
            for (std::map<int, size_t>::iterator it = l2cm.begin(); it != l2cm.end(); it++)
                    c2lm.push_back(std::make_pair(it->second, it->first)); // count, group
    
            std::sort(c2lm.begin(), c2lm.end(), cmp_pair_first_reverse);
            for (int i = 0; i < c2lm.size() && count-- > 0; i++)
                    indices.push_back(c2lm[i]);
            return indices;
    }
    
    static void sortByAngle(std::vector<boost::shared_ptr<Pair> > &group,
                            std::vector<boost::shared_ptr<Pair> > &result)
    {
            std::vector<int> labels;
            cv::Mat samples;
    
            /* Radius is not so important here. */
            for (int i = 0; i < group.size(); i++)
                    samples.push_back(angleToPointOnCircle(group[i]->angle, 100, cv::Point2f(0, 0)));
    
            /* 90 degrees per group. May be less if you need it. */
            static int PAIR_MAX_FINE_GROUPS = 4;
    
            int groupNr = std::max(std::min((int)group.size(), PAIR_MAX_FINE_GROUPS), 1);
            assert(group.size() >= groupNr);
            cv::kmeans(samples.reshape(1, (int)group.size()), groupNr, labels,
                       cvTermCriteria(CV_TERMCRIT_EPS/* | CV_TERMCRIT_ITER*/, 30, 1.0f),
                       100, cv::KMEANS_RANDOM_CENTERS);
    
            std::vector<std::pair<size_t, int> > biggest = biggestKmeansGroup(labels, groupNr);
            for (int g = 0; g < biggest.size(); g++) {
                    for (int i = 0; i < group.size(); i++) {
                            if (labels[i] == biggest[g].second)
                                    result.push_back(group[i]);
                    }
            }
    }