I am working on a project for uni which requires markerless relative pose estimation. To do this I take two images and match n features in certain locations of the picture. From these points I can find vectors between these points which, when included with distance, can be used to estimate the new postition of the camera.
The project is required to be deplyoable on mobile devices so the algorithm needs to be efficient. A thought I had to make it more efficient would be to take these vectors and put them into a Neural Network which could take the vectors and output an estimation of the xyz movement vector based on the input.
The question I have is if a NN could be appropriate for this situation if sufficiently trained? and, if so, how would I calculate the number of hidden units I would need and what the best activation function would be?
Using a neural network for your application can very well work, however, I feel you will need a lot of training samples to allow the network to generalize. Of course, this also depends on the type and number of poses you're dealing with. It sounds to me that with some clever maths it might be possible to derive the movement vector directly from the input vector -- if by any chance you can come up with a way of doing that (or provide more information so others can think about it too), that would very much be preferred, as in that case you would include prior knowledge you have about the task instead of relying on the NN to learn it from data.
If you decide to go ahead with the NN approach, keep the following in mind: