It is said "4916 positive training examples were hand picked aligned, normalized, and scaled to a base resolution of 24x24. 10,000 negative examples were selected by randomly picking sub-windows from 9500 images which did not contain faces." In the paper "Robust Real-Time Face Detection by Paul Viola & Michael Jones"
My question is what do they mean about hand picked aligned, normalized, and scaled to a base resolution of 24x24?
Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces? Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]? Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?
Thanks for your time!
Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces?
Not necceseraly distinct - but yes, they gave 4916 different photos of faces. The faces were found manually by a "human expert".
Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]?
They only used a grey-scale pixels, normalized means they made sure there is no "black" and "white" pictures. If a picture was very dark - it was automatically brightened, and if it was not dark enough - it was darkened. This is done by an automatic component easily.
Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?
Yes, they made sure each "face" is exactly 24x24 pixels by applying some processing on the picture.