I'm attempting to stream a H.264 video feed to a web browser. Media Foundation is used for encoding a fragmented MPEG4 stream (MFCreateFMPEG4MediaSink
with MFTranscodeContainerType_FMPEG4
, MF_LOW_LATENCY
and MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS
enabled). The stream is then connected to a web server through IMFByteStream
.
Streaming of the H.264 video works fine when it's being consumed by a <video src=".."/>
tag. However, the resulting latency is ~2sec, which is too much for the application in question. My suspicion is that client-side buffering causes most of the latency. Therefore, I'm experimenting with Media Source Extensions (MSE) for programmatic control over the in-browser streaming. Chrome does, however, fail with the following error when consuming the same MPEG4 stream through MSE:
Failure parsing MP4: TFHD base-data-offset not allowed by MSE. See https://www.w3.org/TR/mse-byte-stream-format-isobmff/#movie-fragment-relative-addressing
mp4dump of a moof/mdat fragment in the MPEG4 stream. This clearly shows that the TFHD contains an "illegal" base data offset
parameter:
[moof] size=8+200
[mfhd] size=12+4
sequence number = 3
[traf] size=8+176
[tfhd] size=12+16, flags=1
track ID = 1
base data offset = 36690
[trun] size=12+136, version=1, flags=f01
sample count = 8
data offset = 0
[mdat] size=8+1624
I'm using Chrome 65.0.3325.181 (Official Build) (32-bit), running on Win10 version 1709 (16299.309).
Is there any way of generating a MSE-compatible H.264/MPEG4 video stream using Media Foundation?
Based on roman-r advise, I managed to fix the problem myself by intercepting the generated MPEG4 stream and perform the following modifications:
- Modify Track Fragment Header Box (tfhd):
- remove
base_data_offset
parameter (reduces stream size by 8bytes)- set
default-base-is-moof
flag- Add missing Track Fragment Decode Time (tfdt) (increases stream size by 20bytes)
- set
baseMediaDecodeTime
parameter- Modify Track fragment Run box (trun):
- adjust
data_offset
parameter
The field descriptions are documented in https://www.iso.org/standard/68960.html (free download).
Switching to MSE-based video streaming reduced the latency from ~2.0 to 0.7 sec. The latency was furthermore reduced to 0-1 frames by calling IMFSinkWriter::NotifyEndOfSegment
after each IMFSinkWriter::WriteSample call.
There's a sample implementation available on https://github.com/forderud/AppWebStream
The mentioned 0.7 sec latency (in your Status Update) is caused by the Media Foundation's MFTranscodeContainerType_FMPEG4
containterizer which gathers and outputs each roughly 1/3 seconds (from unknown reason) of frames in one MP4 moof
/mdat
box pair. This means that you need to wait 19 frames before getting any output from MFTranscodeContainerType_FMPEG4
at 60 FPS.
To output single MP4 moof
/mdat
per each frame, simply lie that MF_MT_FRAME_RATE
is 1 FPS (or anything higher than 1/3 sec). To play the video at the correct speed, use Media Source Extensions' <video>.playbackRate
or rather update timescale
(i.e. multiply by real FPS) of mvhd
and mdhd
boxes in your MP4 stream interceptor to get the correctly timed MP4 stream.
Doing that, the latency can be squeezed to under 20 ms. This is barely recognizable when you see the output side by side on localhost
in chains such as Unity (research) -> NvEnc -> MFTranscodeContainerType_FMPEG4
-> WebSocket -> Chrome Media Source Extensions display.
Note that MFTranscodeContainerType_FMPEG4
still introduces 1 frame delay (1st frame in, no output, 2nd frame in, 1st frame out, ...), hence the 20 ms latency at 60 FPS. The only solution to that seems to be writing own FMPEG4 containerizer. But that is order of magnitude more complex than intercepting of Media Foundation's MP4 streams.