video compression frame video-processing motion

In motion compensation technique, is it okay for the sender to send only information about motion?

I came across motion compensation techniques while learning about video compression. In the study material, I explained with pictures, but there are parts that I do not understand well.

When the left frame is f0 and the right frame is f1, the difference between the two frames is the position of the basketball and the newly emerged glove.

It is written that transmission data can be reduced by sending only the data about the difference between the two frames (f1-f0) and the movement information of a specific object.

In addition, it is written that it is better to transmit data as it is without using the difference between frames for new parts that did not appear in the previous frame, such as gloves.

But here I don't know why the data from f1-f0 is needed if the receiver has frame f0. Can't we just send the data about the movement information of the basketball and the data of the newly appeared glove? I don't know why the difference between the two frames is necessary.

If only the basketball moves without a glove in frame f1, can't frame f1 be formed by sending only motion information? If it is not right to send only movement information, I would like to know why.

Solution

Yes, for this cartoon example you could just send information about the motion of the ball and the appearance of the glove. However in a real video, having done that, what was hidden behind the ball that is now exposed in f1? Does the ball look exactly the same -- didn't it rotate? What else changed in the scene?

What you would do is model the motion of objects from f0 to f1, apply that model to f0, and now you have an imperfect f1'. Now you also want to send f1 - f1' to get the rest of the differences. That difference will be psycho-visually compressed to keep only what matters. (The difference will include the appearance of the glove.) Modeling the motion of the ball reduces the differences, so now f1' - f0 plus the extent and movement vector of the ball should take less bits to send than f1 - f0.