python python-2.7 opencv computer-vision opticalflow

Unknown output of OpenCV's calcOpticalFlowFarneback

I've been wondering what the optical flow matrix, that calcOpticalFlowFarneback function of OpenCV returns, tells. If I compute this Python line:

flow = cv2.calcOpticalFlowFarneback(cv2.UMat(prvs),cv2.UMat(next), None, 0.5, 3, 15, 3, 5, 1.2, 0)

I will get a matrix, the same size as prvs and next frames, containing, for each position, a vector of two elements (x,y). My question is... That vector is the vector from prvs to next or from next to prvs?

Thanks.

Solution

The general purpose of an optical flow method is to find the velocity component of each pixel (if dense) or of each feature point (if sparse) between two images (or video frames, typically). The idea is that pixels in frame N-1 move to new positions in frame N, and the difference in the location of these pixels is like a velocity vector. That means that a pixel at location (x, y) in the previous frame will be at location (x+v_x, y+v_y) in the next frame.

For the values of pixels, that means that for a given position (x, y), the value of the pixel at prev_frame(x, y) is the same as the value of the pixel at curr_frame(x+v_x, y+v_y). Or more specifically, in terms of actual array indices:

prev_frame[y, x] == curr_frame[y + flow[y, x, 1], x + flow[y, x, 0]]

Notice the reverse ordering of (x, y) here. Arrays are indexed with (row, col) ordering, which means the y component comes first, and then the x component. Do take special care to note that flow[y, x] is a vector where the first element is the x coordinate, and the second is the y coordinate---hence why I added y + flow[y, x, 1] and x + flow[y, x, 0]. You'll see the same thing written in the docs for calcOpticalFlowFarneback():

The function finds an optical flow for each prev pixel using the Farneback algorithm so that
prev(y,x) ~ next(y + flow(y,x)[1], x + flow(y,x)[0])

Dense optical flow algorithms expect the pixels to be not very far from where they started, hence they're typically used on video---where there's not a huge amount of change every frame. If there's a massive difference every frame, you're likely not going to get the proper estimation. Of course, the purpose of the pyramid resolution model is to help with larger jumps, but you'll need to take care with choosing the proper scales of resolution.

Here's a full fledged example. I'll start with this short timelapse that I shot in Vancouver earlier this year. I'll create a function which ascribes the direction of the flow for each pixel with a color, and the magnitude of the flow with the brightness of that color. That means brighter pixels will correspond to higher flows, and the color corresponds to the direction. This is what they do in the last example on the OpenCV optical flow tutorial as well.

import cv2
import numpy as np

def flow_to_color(flow, hsv):
    mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])
    hsv[..., 0] = ang*180/np.pi/2
    hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
    return cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

cap = cv2.VideoCapture('vancouver.mp4')

fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('optflow.mp4', fourcc, fps, (w, h))

optflow_params = [0.5, 3, 15, 3, 5, 1.2, 0]

frame_exists, prev_frame = cap.read()
prev = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
hsv = np.zeros_like(prev_frame)
hsv[..., 1] = 255

while(cap.isOpened()):
    frame_exists, curr_frame = cap.read()
    if frame_exists:
        curr = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
        flow = cv2.calcOpticalFlowFarneback(prev, curr, None, *optflow_params)
        rgb = flow_to_color(flow, hsv)
        out.write(rgb)
        prev = curr
    else:
        break

cap.release()
out.release()
print('done')

And here's the resulting video.

However, what you want to do is interpolate between frames. This gets a little confusing, because the best way to do that is with cv2.remap() but this function works in the opposite direction that we want. The optical flow tells us where the pixel goes, but remap() wants to know where the pixel came from. So actually, we need to swap the ordering of the optical flow calculation to remap. See my answer here for a thorough explanation of the remap() function.

So here I've created a function interpolate_frames() which will interpolate over however many frames you want from the flow. This works exactly as we discussed in the comments but note the flipped ordering of curr and prev inside calcOpticalFlowFarneback().

The timelapse video above is a bad candidate since the interframe movement is very high. Instead, I'll use a short clip from another video shot in the same location as the input.

import cv2
import numpy as np


def interpolate_frames(frame, coords, flow, n_frames):
    frames = [frame]
    for f in range(1, n_frames):
        pixel_map = coords + (f/n_frames) * flow
        inter_frame = cv2.remap(frame, pixel_map, None, cv2.INTER_LINEAR)
        frames.append(inter_frame)
    return frames


cap = cv2.VideoCapture('vancouver.mp4')

fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('optflow-inter1a.mp4', fourcc, fps, (w, h))

optflow_params = [0.5, 3, 15, 3, 5, 1.2, 0]

frame_exists, prev_frame = cap.read()
prev = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
y_coords, x_coords = np.mgrid[0:h, 0:w]
coords = np.float32(np.dstack([x_coords, y_coords]))

while(cap.isOpened()):
    frame_exists, curr_frame = cap.read()
    if frame_exists:
        curr = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
        flow = cv2.calcOpticalFlowFarneback(curr, prev, None, *optflow_params)
        inter_frames = interpolate_frames(prev_frame, coords, flow, 4)
        for frame in inter_frames:
            out.write(frame)
        prev_frame = curr_frame
        prev = curr
    else:
        break

cap.release()
out.release()

And here's the output. There are 4 frames for every frame in the original, so it's slowed down 4x. Of course, there will be black edge pixels coming in so when doing this you'll probably either want to do some sort of border interpolation of your frames (you can use cv2.copyMakeBorder()) to repeat similar edge pixels, and/or crop the final output a bit to get rid of that. Note that most video stabilization algorithms do crop the image for similar reasons. That's part of the reason why, when you switch your phone camera to video, you'll notice a larger focal length (it looks zoomed in a bit).