I need to encapsulate H.264 video into a mpeg-4 container. What are the absolute minimum set of boxes/atoms do I need to have? Contained H.264 video is progressive, containing 30fps video, YUV420p, without audio, no subtitles, no program information. Only one stream. No performance or file size optimization required. It will be non-fragmented mp4 for the time being. Would it make things simpler to have it fragmented? performance can be modest.
moov
mvhd
trak
tkhd
mdia
mdhd
hdlr
minf
vmhd
dinf
dref
url
stbl
stsd
avc1
avcC
stts
stsc
stsz
stco
stss
mdat
You'll also need ctts
if you have B frames.