Search code examples
sox

sox - pipe input broken?


I am trying to make use of the sox input commands to allow processing file formats that sox itself does not support without the need to convert all inputs first.

My last try is the following (create a spectrogram of a diff between two different AAC encodings):

sox -S -m \
    -v  1 -t s24 -r 48k -c 2 -L "|ffmpeg -i input_orig.aac -vn -f s24le -" \
    -v -1 -t s24 -r 48k -c 2 -L "|ffmpeg -i input_faac.aac -vn -f s24le -" \
    -n \
    spectrogram -x 1600 -y 480 -o diff.faac.png

However, the result is that the spectrogram does only contains the first few seconds, so I must be missing something.

But what?

Update

Tried a simpler test with a single pipe to see whether that works, but it results in the same issue:

ffmpeg -hide_banner -i input_orig.aac -vn -f s24le - | sox -S -t s24 -c 2 -r 48k -L - -n spectrogram -x 480 -y 96 -o orig.pipe.png

Result: enter image description here

...while doing it directly using a pre-converted file (to FLAC or WAV), produces the correct result: enter image description here

Output for the simple pipe command:

Input File     : '-' (raw)
Channels       : 2
Sample Rate    : 48000
Precision      : 24-bit
Sample Encoding: 24-bit Signed Integer PCM

In:0.00% 00:00:00.00 [00:00:00.00] Out:0     [      |      ]        Clip:0    [aac @ 0x5643e7dc9940] Estimating duration from bitrate, this may be inaccurate
Input #0, aac, from 'input_orig.aac':
  Duration: 00:01:35.08, bitrate: 136 kb/s
    Stream #0:0: Audio: aac (LC), 48000 Hz, stereo, fltp, 136 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s24le (native))
Press [q] to stop, [?] for help
Output #0, s24le, to 'pipe:':
  Metadata:
    encoder         : Lavf58.20.100
    Stream #0:0: Audio: pcm_s24le, 48000 Hz, stereo, s32, 2304 kb/s
    Metadata:
      encoder         : Lavc58.35.100 pcm_s24le
In:0.00% 00:01:27.55 [00:00:00.00] Out:4.20M [!=====|=====!] Hd:0.0 Clip:0    size=   28110kB time=00:01:39.94 bitrate=2304.0kbits/s speed= 417x    
video:0kB audio:28110kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
In:0.00% 00:01:39.95 [00:00:00.00] Out:4.80M [!=====|======] Hd:0.0 Clip:0    
Done.

Solution

  • After some more experiments, I finally found out that it is only related to the sox spectrogram effect, because it cannot deduct the duration from the input but it needs that for calculating the layout.

    Apparently, that requirement is not documented.

    However, to specify a duration precisely, we have to provide it with the number of samples of the input.

    I am using ffprobe and jq for that:

    ffprobe -v error -print_format json -select_streams a:0 -show_entries frame=nb_samples input_orig.aac \
        | jq --stream 'select(.[0][2] == "nb_samples")[1]' \
        | jq --slurp 'add'
    

    (yes, there are simpler ways, but they don't scale)

    ...which gives me 4797440 samples for my input files.

    Now it actually works with adding -d 4797440s to the spectrogram effect:

    sox -S -m \
        -v  1 -t s24 -r 48k -c 2 -L "|ffmpeg -v error -i input_orig.aac -vn -f s24le -" \
        -v -1 -t s24 -r 48k -c 2 -L "|ffmpeg -v error -i input_faac.aac -vn -f s24le -" \
        -n \
        spectrogram -d 4797440s -x 1600 -y 480 -o diff.faac.png
    

    New result:

    enter image description here