How to make a neural net give position?

I understand how to do classification problems and starting to understand convolution networks which I think is the answer to some extent. I'm a bit confused on how to setup a network to give me the output position.

Let's say you have the position of the end point of noses for a data set with faces. To find the end point do you just do a 'classification' type problem where your output layer is something like 64x64 = 4096 points but if the nose is at point row 43 and column 20 of your grid you just set the output as all zero's except for at element 43*64 + 20 = 2772 where you set it equal to 1? Then just map it back to your image dimensions.

I can't find much info on how this part of identification works and this is my best guess. I'm working towards a project at the second with this methodology, but it is going to be a lot of work and want to know if I'm at least on the right track. This seems to be a solved problem, but I just can't seem to find how people do this.

Solution

Although what you describe could feasibly work, generally neural networks (convolutional and otherwise) are not used to determine the position of a feature in an image. In particular, Convolutional Neural Networks (CNNs) are specifically designed to be translation invariant so that they will detect features regardless of their position in the input image - this is sort of the inverse of what you're looking for.

One common and effective solution for the kind of problem you're describing is a cascade classifier. They have some limitations, but for the kind of application you're describing, it would probably work quite well. In particular, cascade classifiers are designed to provide good performance owing to the staged approach in which most sections of the input image are very quickly dismissed by the first couple stages.

Don't get me wrong, it may be interesting to experiment with using the approach you described; just be aware that it may prove difficult to get it to scale well.