I came across this question in datacamp.com:
Bellow are three scatter plots of the same point cloud. Each scatter plot shows a different set of axes (in red). In which of the plots could the axes represent the principal components of the point cloud?
Recall that the principal components are the directions along which the the data varies?
Answer: Plot 1 and 3
My question is what does the question mean? Why is plot 2 not part of the answer since the axis can be rotated to fit the point cloud.
As suggested in the comments, this is better fit for cross validation, or possibly math.stackexchange.
Now the answer is intuitively rather simple.
Principal components can be obtained by an iterative process such that:
a_1 %*% X
which maximizes Var(a_1 %*% X)
subject to t(a_1) %*% a_1 = 1
a_2 %*% X
which maximizes Var(a_2 %*% X)
subject to t(a_2) %*% a_2 = 1
and cov(a_1 %*% X, a_2 %*% X) = 0
From this definition note that var(a_1 %*% X) = var( - a_1 %*% X)
, and thereby the principal component is only determined up to the sign of the component.
From this definition we can see that: 1. 1 and 3 are equivalent, as the first (longest) line is in the direction where the points are most spread (show the greatest variance) 2. The 2'nd plot cannot be the principal component as the direction does not line up with the direction of greatest variance
Chapter 8, page 430 (ish) in Applied Multivariate Statistical Analysis contains a theoretical explanation in more detail.