NVIDIA optical flow algorithm producing vectors with unreasonably large magnitudes/directions

I made a simple test movie to play with the optical flow SDK. The movie is two textured rectangles, one stationary, the other moving from left to right and back.

Working on 8 bit greyscale, after acquiring the flow I divide the vector components by 32.0f to convert from fixed point to float. I’m using NV_OF_OUTPUT_VECTOR_GRID_SIZE_1 at this stage, along with NV_OF_PERF_LEVEL_SLOW. I’m iterating over the image in steps of 32, taking the direction vector at each position and drawing it onto a bitmap. I’m presenting each frame of the original video one at a time, with the most recent frame becoming the reference frame for the next iteration.

Would anyone care to speculate/give me some ideas as to why the vectors look like this? Is there a problem at the boundary, for example? They appear to be very large. Do I have to normalise them somehow, other than dividing the components by 32? What mistake do you think I’ve made here? It's more apparently from the video linked above.

Solution

There are many ways to calculate optical flow.

A flow vector requires a visual match between two image patches. Flow vectors are often calculated by block matching or similar methods that rely on local neighborhoods (LK). Calculating optical flow is similar to calculating stereo disparity, with stereo disparity being constrained to searching along a single dimension only.

You can judge the quality of a flow vector by the quality of the match between patches, i.e. by which dimensions are "nailed down" by the patches. Flat patches constrain no dimensions at all, except maybe brightness (brightness constancy assumption). Patches showing an edge only constrain perpendicularly to that edge (the patch can slip along the edge). Patches containing "corners" constrain in both spatial dimensions.

On untextured/flat image patches, any arbitrary flow hypothesis is "plausible". Such flow can be considered "weak" because lots of other flow hypotheses are equally plausible.

The flat black background in your picture should be considered to not support any particular flow. It might make sense to assume flow there, or not. That depends on the larger structure of the image. Some algorithms use well-supported flow and then interpolate this over areas of weak support. Some algorithms claim zero flow for unsupportive areas. All of this is just guesswork. Strictly optically speaking, those areas should have no flow because... the "null hypothesis" cannot be rejected. Assuming a "physical" scene with objects, the object points may move, but if the objects are untextured, you can't see them moving.

You might want to look at the "Harris" corner detector, or at least parts of its construction. First, gradients of the image are calculated. Then the covariance matrix of the gradients, for every pixel, is calculated. Finally, the eigenvalues of each matrix are assessed. They both need to be large for that point to be "cornery". You want patches that have enough cornery pixels.

You can sidestep all this by using pictures that are textured everywhere instead of just the objects having texture. In lieu of that, you'd want to erase unsupported flow vectors, or not plot them.