What is an intuitive explanation of the FFT algorithm? Where does it cut corners/optimize? How does it take phase into account?
An FFT is just a way to speed up a DFT, from O(N*N) to O(NlogN) by factorizing a complex matrix transform. It actually does not cut corners, but is not only faster, but usually slightly more accurate than a DFT due to less opportunities for arithmetic rounding errors or quantization noise via less total operations.
For strictly real input, a DFT is just N correlations against a set of orthogonal sine waves and N correlations against a set of orthogonal cosine waves, over a range of frequencies from DC to the half the sample rate. (In complex form, the cosine correlations being the real part, and the sine correlations the imaginary component). Those correlations can all be performed by an N*N complex matrix multiply.
The phase is just the relationship (atan2()) of the cosine correlations (even portion), and sine wave correlations (odd portion), of the input waveform. e.g. a waveform perfectly symmetric around the DFT window center will have a phase of zero, and a waveform perfectly antisymmetric around the center will have a phase of pi or -pi. And any (non-pathological real) waveform can be decomposed into a pair of even and odd waveforms, and thus has some phase (via atan2()) of each sinusoidal component, referenced to the edges of the DFT window (or to the center, if you do an fftshift beforehand).
For complex input, there is a similar complex arithmetic formulation of all the above, using correlations against unit complex exponentials (instead of sines and cosines).