I cannot understand on these slides why is the SVD applied to the Least Square Problem?
And then it follows this:
And here I don't understand why was the Derivative of the Residuals taken, and is it the Idea in that graph to take the Projection of y to minimize the error?
Here is my humble trial to explain this...
The first slide does not explain yet how SVD is related to LS. But it says that whenever X is a "standard" matrix, one can transform the problem with a Singular matrix (only diagonal elements are not null) - which is convenient for computation.
Slide 2 shows the computation to be done using the singular matrix.
Explanation are on slide 3 : minimizing the norm of r is equivalent to minimizing its square which is the RSS (because x -> x*x is an increasing function for x>0). Minimizing RSS: same as minimizing any "good" function, you derivate it, and then equal the derivative to 0.