Principal Component Analysis - Dimensionality Reduction

When we talk about PCA we say that we use it to reduce the dimensionality of the data. I have 2-d data, and using PCA reduced the dimensionality to 1-d.

Now,

The first component will be in such a way that it captures the maximum variance. What does it mean that the 1st component has max. variance?

Also, if we take 3-d data and reduce its dimensionality to 2-d then the 1st component will be built with max variance along the x-axis or y-axis?

Solution

PCA works by first centering the data at the origin (subtracting the mean from each data point), and then rotating it to be in line with the axes (diagonalizing the covariance matrix into a “variance” matrix). The components are then sorted so that the diagonal of the variance matrix is in descending order, which translates to the first component having the largest variance, the second having the next largest variance, etc. Later, you squish your original data by zero-ing out less important components (projecting onto principal components), and then undoing the aforementioned transformations.

To answer your questions:

The first component having the max variance means that its corresponding entry in the variance matrix is the largest one.
I suppose it depends on what you call your axes.

Source: Probability and Statistics for Computer Science by David Forsyth.