A Summary of How SURF Works

I am trying to figure out how SURF feature detection works. I think I have made some progress. I would like to know how off I am from what's really going on.

A template image you have already got stored and a real-world image are compared on the basis of "key points" or some important features in the two images.
The smallest Euclidean distance between the same points constitutes a good match.
What constitutes an important feature or keypoint? A corner
(intersection of edges) or a blob (sharp change in intensity).
SURF uses blobs.
It uses a Hessian matrix for blob detection or feature extraction.
The Hessian matrix is a matrix of second derivatives: this is to
figure out the minima and maxima associated with the intensity of a
given region in the image.

Solution

sift/surf etc have 3 stages:

find features/keypoints that are likely to be found in different images of same object again (surf uses box filters afair). those features should be scale and rotation invariant if possible. corners, blobs etc are good and most often searched in multiple scales.
find the right "orientation" of that point so that if the image is rotated according to that orientation, both images are aligned in regard to that single keypoint.
computation of a "descriptor" that has information of how the neighborhood of the keypoint looks like (after orientation) in the right scale.

now your euclidean distance computation is done only on the descriptors, not on the keypoint locations!

it is important to know that step 1 isnt fixed for SURF. SURF in fact is step 2-3 but the authors give a suggestion how step 1 can be done to have some synergies with steps 2-3. the synergy is that both, step 1 and 3 use integral images to speed things up, so the integral image has to be computed only once.