Search code examples
videohtml5-videohttp-live-streamingmpeg-dashsmooth-streaming

Splitting mp4 files vs mp4-dash


While serving video to users on website there were few options to choose from. Namely HLS, Smooth streaming, Dash or HDS. Dash seemed to be a better choice. Looking at how it works is that it splits the file in many parts and streams it. Another option would by splitting the files manually. What is the difference between dash, and splitting mp4 files.


Solution

  • Dash, Smooth streaming and HLS are all adaptive streaming technologies. These technologies allows you:

    • Serve content in segments - each segment is small playable chunk of content (audio, video or even text - eg. captions). Length of single segment is usually few seconds. That's what makes it "streaming" technology and is very similar to what you could try to achieve by splitting MP4 files manually.
    • Serve content in multiple quality levels - depending on network connection, performance and screen resolution of target device, player can use appropriate quality to reduce chance of buffering or stuttering. To make this work, segment with specific index in the stream must be exactly aligned (start and length) cross all quality levels - that is achieved during encoding. That's what makes it "adaptive" technology.
    • Consume manifest - manifest is description of the whole content and all available quality levels. You can have single video content in 10+ quality levels with several different audio streams (different codecs or languages) also having few quality levels. To consume it you need to tell player where to find individual segments - that is the purpose of manifest. Different technologies have different format of manifest. Dash provides many options how to describe the content. The verbose option consist of single MP4 source file per quality level, and segment description is just byte offset from the beginning of the file and byte length till the end of the segment. But you can have more compact descriptions like segment template and requesting segments by index.

    So while you could achieve all of that by creating your own protocol, why would you do that instead of using a standard?

    To answer your question in comments: Is there any difference in total data transferred in both cases?

    In general no. It is still the same video and audio content with addition of manifest. The manifest is a text file (easily GZiped) - its size is very dependent on the way how the content is described. In case of verbose option, it is dependent length of the content, average length of segment, number of streams and number of quality levels.

    Once you start using full power of Dash and use lower quality levels for scenarios where client may not need or may not be capable of playing the higher qualities, you can significantly reduce amount of transferred data.