Search code examples
historyspecificationsmpeg-4

MPEG-4 Part 2 had some awesome face- and body- motion concepts, but they disappeared in MPEG-4 Part 10 (H.264). Why?


During the last few weeks, I had the opportunity to read two documents:

  • The MPEG-4 Part 2 specification (ISO/IEC 14496-2), which people just call "mpeg-4"
  • The MPEG-4 Part 10 specification (ISO/IEC 14496-10), which is also called "h.264" or "AVC"

After having read all the cool ideas in "mpeg-4" like identifying facial expression, motion of limbs of people, and sprites, I got really excited. The ideas sound very fun, maybe even fantastic, for an idea from 1999.

But then I read the "h.264" standard, and none of those ideas were there. There was a lot of discussion on how to encode pixels, but none of the really cool ideas.

What happened? Why were these ideas removed?

This is not a code question, but as a programmer I feel I should attempt to understand as much of the intent behind a specification. If the code I write adheres to the spirit in which the specification was meant to be used, it's more likely to be positioned to take advantage of the entire specification.


Solution

  • You seem to be making the assumption that the MPEG-4 Part 10 specification improves on MPEG-4 Part 2, while the fact is that these two specifications are unrelated, have nothing in common and were even developed by different people (MPEG developed the Part 2 specification, while ITU-T, ISO, IEC and MPEG together developed the Part 10 specification).

    Keep in mind that ISO/IEC 14496 standard is a collection of specifications that apply to different aspects of audiovisual encoding. The goal of the Part 2 specification is to encode different kinds of visual objects (video, 3D objects, etc.). The goal of Part 10 is to provide a very efficient and high quality encoding for video. Other parts of the standard deal with other aspects, for example the Part 3 specification deals with audio encoding, and Parts 12 and 15 define a container file format that is most typically used to wrap Part 10 video (i.e. H.264) and Part 3 audio (i.e. AAC) into a single file, the so called .mp4 format.

    I hope this helps!