The problem I am trying to solve is like this: given a blob in an image, I want to get its orientation to guide me draw lines to fill the area. I want the lines to follow the long axis of the area, to use as few lines as possible.
I googled around, and come across PCA (principal component analysis) as a good method to get orientation of blob, by feeding PCA algorithm all the point coordinates:
But copying the exact algorithm, I get very surprising results. Given similar shaped areas, the PCA algorithm returns completely different eigenvectors. They seem perpendicular:
The lines above are rendered by following the slope given by PCA algorithm, with slight random variation.
I don't know much about data processing, what is happening here? How should I address this problem?
Code:
import numpy as np
# I tried passing different set of points to pca:
# 1. Only points at the perimeter of the area
# 2. Random sample of points within the area
# 3. All points in the area
# points are like [(x1, y1), (x2, y2), ... ]
def pca(points):
xs, ys = zip(*points)
x = np.array(xs)
y = np.array(ys)
# Subtract mean from each dimension. We now have our 2xm matrix.
x = x - np.mean(x)
y = y - np.mean(y)
coords = np.vstack([x, y])
# Covariance matrix and its eigenvectors and eigenvalues
cov = np.cov(coords)
vals, evecs = np.linalg.eig(cov)
# Sort eigenvalues in decreasing order (we only have 2 values)
sort_indices = np.argsort(vals)[::-1]
evec1, evec2 = evecs[:, sort_indices] # Eigenvector with largest eigenvalue
eval1, eval2 = vals[sort_indices[0]], vals[sort_indices[1]]
print("PCA:", evec1, evec2, eval1, eval2)
return evec1, evec2, eval1, eval2
I tried passing different set of points to pca:
But it doesn't matter, with each of the point selection, my algorithm can produce the above 2 patterns. Although it seems the one on the right (the incorrect one) are produced more often.
The mistake is in this line:
evec1, evec2 = evecs[:, sort_indices]
The eigenvectors are in the columns, but that assignment assigns the first row of evecs[:, sort_indices]
to evec1
and the second row to evec2
. A quick fix is
evec1, evec2 = evecs[:, sort_indices].T