Search code examples
audioffmpegsignal-processingmultimedia

FFMPEG - Multi Track, Multi Channel file to discrete mono files


I have files which are multi track, and multi channel (ie, track 1 may be 5.1, track 2 may be stereo, track 3 may be stereo etc)

I am looking to output every channel from every track into its own 'unrolled' discrete mono file.

example media:

ffprobe version 4.3.1-0york0~18.04 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version='0york0~18.04' --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libzimg --enable-pocketsphinx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
[mxf @ 0x55d3e7fc2680] wrapping of stream 0 is unknown
[jpeg2000 @ 0x55d3e805ce00] End mismatch 1
    Last message repeated 1 times
Input #0, mxf, from 'redacted.mxf':
  Metadata:
    operational_pattern_ul: 060e2b34.04010101.0d010201.01010900
    modification_date: 2019-10-03T09:58:16.368000Z
    uid             : f6267ae2-680e-4357-9b1d-c77c045d3cd7
    generation_uid  : e7e6f5a1-6f15-4df5-aea8-a41f3ef535d6
    company_name    : redacted
    product_name    : redacted
    product_version : 11.6.1.5.301404
    product_uid     : 84ae5ffc-4710-11dd-a6fe-0010c629ec73
    application_platform: 4KICR1
    material_package_umid: 0x060A2B340101010501010D2013000000BE3608F3135E48AD99E4340643E47F22
    timecode        : 00:59:20:00
  Duration: 00:26:16.07, start: 0.000000, bitrate: 139194 kb/s
    Stream #0:0: Video: jpeg2000, yuv422p10le(progressive), 1920x1080, SAR 1:1 DAR 16:9, 23.98 tbr, 23.98 tbn, 23.98 tbc
    Metadata:
      file_package_umid: 0x060A2B340101010501010D201300000091A43E578B86490698045924FA9EECC5
      track_name      : Picture
    Stream #0:1: Audio: pcm_s24le, 48000 Hz, 6 channels, s32 (24 bit), 6912 kb/s
    Metadata:
      file_package_umid: 0x060A2B340101010501010D201300000091A43E578B86490698045924FA9EECC5
      track_name      : Sound
    Stream #0:2: Audio: pcm_s24le, 48000 Hz, 2 channels, s32 (24 bit), 2304 kb/s
    Metadata:
      file_package_umid: 0x060A2B340101010501010D201300000091A43E578B86490698045924FA9EECC5
      track_name      : Sound
    Stream #0:3: Audio: pcm_s24le, 48000 Hz, 2 channels, s32 (24 bit), 2304 kb/s
    Metadata:
      file_package_umid: 0x060A2B340101010501010D201300000091A43E578B86490698045924FA9EECC5
      track_name      : Sound
    Stream #0:4: Audio: pcm_s24le, 48000 Hz, 2 channels, s32 (24 bit), 2304 kb/s
    Metadata:
      file_package_umid: 0x060A2B340101010501010D201300000091A43E578B86490698045924FA9EECC5
      track_name      : Sound
    Stream #0:5: Data: none
    Metadata:
      file_package_umid: 0x060A2B340101010501010D201300000091A43E578B86490698045924FA9EECC5
      track_name      : Auxiliary Data
      data_type       : vbi_vanc_smpte_436M
Unsupported codec with id 0 for input stream 5

These files are vendor qualified masters, and the track / channel combinations vary between vendors, so some might be stereo, 5.1, 7.1 order, some might be all discrete mono already, some might be discrete stereo, 5.1, and mono tracks. Its all a mix. So im looking for some general strategy that gracefully handles all channels from all tracks.

Now I have seen various strategies documented to handle discretizing audio via ffmpeg docs, but none of them seem to show how to target different channels from different tracks. Im sure its a pebkac error, but I'd love some guidance.

I have tried both a map_channel approach as well as a -filtercomplex channelsplit approach.

ffmpeg -i redacted.mxf -ss 60 \
-map_channel 0.1.0 -t 10 track_1_0.wav \
-map_channel 0.1.1 -t 10 track_1_1.wav \
-map_channel 0.1.2 -t 10 track_1_2.wav \
-map_channel 0.1.3 -t 10 track_1_3.wav \
-map_channel 0.1.4 -t 10 track_1_4.wav \
-map_channel 0.1.5 -t 10 track_1_5.wav \
-map_channel 0.2.0 -t 10 track_2_0.wav \
-map_channel 0.2.1 -t 10 track_2_1.wav \
-map_channel 0.3.0 -t 10 track_3_0.wav \
-map_channel 0.3.1 -t 10 track_3_1.wav \
-map_channel 0.4.0 -t 10 track_4_0.wav \
-map_channel 0.4.1 -t 10 track_4_1.wav 

However, the output files are not all mono, some are marked as 5.1. I dont believe they are inheriting a sane / correct channel layout (mono) - but the output files that are marked 5.1 are nonsensical, as they are all sourced from stereo tracks. ie track_2_0.wav track_2_1.wav, track_3_0.wav, track_3_1.wav, track_4_0.wav, track_4_1.wav. Which seems odd. Track 1_0 from the above command outputs a sane media info:

File size                                : 938 KiB
Duration                                 : 10s 0ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 768 Kbps
Writing application                      : Lavf58.45.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 10s 0ms
Bit rate mode                            : Constant
Bit rate                                 : 768 Kbps
Channel(s)                               : 1 channel
Sampling rate                            : 48.0 KHz
Bit depth                                : 16 bits
Stream size                              : 938 KiB (100%)

However the second and 3rd track have the wrong channel layout and an unexpected codec id:

Format                                   : Wave
File size                                : 5.49 MiB
Duration                                 : 10s 0ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 4 608 Kbps
Writing application                      : Lavf58.45.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 00000001-0000-0010-8000-00AA00389B71
Duration                                 : 10s 0ms
Bit rate mode                            : Constant
Bit rate                                 : 4 608 Kbps
Channel(s)                               : 6 channels
Channel layout                           : L R C LFE Lb Rb
Sampling rate                            : 48.0 KHz
Bit depth                                : 16 bits
Stream size                              : 5.49 MiB (100%)

Additionally re: map_channel, there are some docs that cast doubt that its the right approach:

Note that currently each output stream can only contain channels from a single input stream; you can’t for example use "-map_channel" to pick multiple input audio channels contained in different streams (from the same or different files) and merge them into a single output stream. It is therefore not currently possible, for example, to turn two separate mono streams into a single stereo stream. However splitting a stereo stream into two single channel mono streams is possible.

Using filter complex, the docs/bug tracker have an example of discretizing 5.1 and marking mono. I can target the tracks I want, and get a valid filter chain as seen in debug log reporting, however I only get audio for the 1st track:

ffmpeg -y -v 40 -i redacted.mxf -ss 60 \
    -disposition:a default \
    -filter_complex \
    "[0:a:0]channelsplit=channel_layout=5.1[c1][c2][c3][c4][c5][c6],\
    [c1]aformat=channel_layouts=mono[c1],\
    [c2]aformat=channel_layouts=mono[c2],\
    [c3]aformat=channel_layouts=mono[c3],\
    [c4]aformat=channel_layouts=mono[c4],\
    [c5]aformat=channel_layouts=mono[c5],\
    [c6]aformat=channel_layouts=mono[c6],\
    [0:a:1]channelsplit=channel_layout=stereo[c7][c8],\
    [c7]aformat=channel_layouts=mono[c7],\
    [c8]aformat=channel_layouts=mono[c8],\
    [0:a:2]channelsplit=channel_layout=stereo[c9][c10],\
    [c9]aformat=channel_layouts=mono[c9],\
    [c10]aformat=channel_layouts=mono[c10],\
    [0:a:3]channelsplit=channel_layout=stereo[c11][c12],\
    [c11]aformat=channel_layouts=mono[c11],\
    [c12]aformat=channel_layouts=mono[c12]"\
     -map  "[c1]" -t 10 1.wav\
     -map  "[c2]" -t 10 2.wav\
     -map  "[c3]" -t 10 3.wav\
     -map  "[c4]" -t 10 4.wav\
     -map  "[c5]" -t 10 5.wav\
     -map  "[c6]" -t 10 6.wav\
     -map  "[c7]" -t 10 7.wav\
     -map  "[c8]" -t 10 8.wav\
     -map  "[c9]" -t 10 9.wav\
     -map  "[c10]" -t 10 10.wav\
     -map  "[c11]" -t 10 11.wav\
     -map  "[c12]" -t 10 12.wav

TL/DR;

In short, how does one export every channel of every track as a discrete mono audio track (regardless of the channel layouts?)

Thank you!


Solution

  • You can't reuse labels from filter outputs. Use intermediate labels.

    ffmpeg -y -v 40 -i redacted.mxf -ss 60 \
        -disposition:a default \
        -filter_complex \
        "[0:a:0]channelsplit=channel_layout=5.1[a1][a2][a3][a4][a5][a6],\
        [a1]aformat=channel_layouts=mono[c1],\
        [a2]aformat=channel_layouts=mono[c2],\
        [a3]aformat=channel_layouts=mono[c3],\
        [a4]aformat=channel_layouts=mono[c4],\
        [a5]aformat=channel_layouts=mono[c5],\
        [a6]aformat=channel_layouts=mono[c6],\
        [0:a:1]channelsplit=channel_layout=stereo[a7][a8],\
        [a7]aformat=channel_layouts=mono[c7],\
        [a8]aformat=channel_layouts=mono[c8],\
        [0:a:2]channelsplit=channel_layout=stereo[a9][a10],\
        [a9]aformat=channel_layouts=mono[c9],\
        [a10]aformat=channel_layouts=mono[c10],\
        [0:a:3]channelsplit=channel_layout=stereo[a11][a12],\
        [a11]aformat=channel_layouts=mono[c11],\
        [a12]aformat=channel_layouts=mono[c12]"\
         -map  "[c1]" -t 10 1.wav\
         -map  "[c2]" -t 10 2.wav\
         -map  "[c3]" -t 10 3.wav\
         -map  "[c4]" -t 10 4.wav\
         -map  "[c5]" -t 10 5.wav\
         -map  "[c6]" -t 10 6.wav\
         -map  "[c7]" -t 10 7.wav\
         -map  "[c8]" -t 10 8.wav\
         -map  "[c9]" -t 10 9.wav\
         -map  "[c10]" -t 10 10.wav\
         -map  "[c11]" -t 10 11.wav\
         -map  "[c12]" -t 10 12.wav