Search code examples
ffmpegfluent-ffmpeg

FFMPEG - How to pipe RMS_level and pts_time metadata without generating unwanted metadata


I am trying to find the loudest (highest rms_level) moment in an audio file, but I need to pipe the metadata rather than write to a file.

I converted the answer found here: [https://superuser.com/questions/1183663/determining-audio-level-peaks-with-ffmpeg][1]

By removing the write to file command and adding a pipe. Here's what I've got.

ffmpeg -i loudSoft.mp3 -af astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.RMS_level -f null - 2> result.txt

The only problem is, now I've got a lot of unwanted metadata before and after the RMS_level and pts_time data as well as [Parsed_ametadata_1 @ 0x7f9d42c37500] being printed on each line. None of that was being written when I was writing to a file instead of piping. (all I need is the time and the rms.)

Here is an abridged version of what I get when I write to file:

frame:0    pts:0       pts_time:0
lavfi.astats.Overall.RMS_level=-inf
frame:1    pts:47      pts_time:0.00106576
lavfi.astats.Overall.RMS_level=-165.163347
frame:2    pts:1199    pts_time:0.0271882
lavfi.astats.Overall.RMS_level=-99.736394
frame:3    pts:2351    pts_time:0.0533107
lavfi.astats.Overall.RMS_level=-88.112282
frame:4    pts:3503    pts_time:0.0794331
lavfi.astats.Overall.RMS_level=-86.554314
frame:5    pts:4655    pts_time:0.105556
lavfi.astats.Overall.RMS_level=-82.977501
frame:6    pts:5807    pts_time:0.131678
lavfi.astats.Overall.RMS_level=-79.698739
frame:7    pts:6959    pts_time:0.1578
lavfi.astats.Overall.RMS_level=-76.629393
frame:8    pts:8111    pts_time:0.183923
lavfi.astats.Overall.RMS_level=-71.581211
frame:9    pts:9263    pts_time:0.210045
lavfi.astats.Overall.RMS_level=-75.038503
frame:10   pts:10415   pts_time:0.236168

And here is what I'm looking at:

ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
  built with Apple clang version 11.0.0 (clang-1100.0.33.16)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.2.2_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/adoptopenjdk-13.0.1.jdk/Contents/Home/include/darwin -fno-stack-check' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mp3, from 'loudSoft2.mp3':
  Metadata:
    encoder         : Lavf58.29.100
  Duration: 00:00:09.85, start: 0.025057, bitrate: 128 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.54
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
[Parsed_ametadata_1 @ 0x7f9d42c37500] frame:0    pts:0       pts_time:0
[Parsed_ametadata_1 @ 0x7f9d42c37500] lavfi.astats.Overall.RMS_level=-inf
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
[Parsed_ametadata_1 @ 0x7f9d42c37500] frame:1    pts:47      pts_time:0.00106576
[Parsed_ametadata_1 @ 0x7f9d42c37500] lavfi.astats.Overall.RMS_level=-165.163347
[Parsed_ametadata_1 @ 0x7f9d42c37500] frame:2    pts:1199    pts_time:0.0271882
[Parsed_ametadata_1 @ 0x7f9d42c37500] lavfi.astats.Overall.RMS_level=-99.736394
[Parsed_ametadata_1 @ 0x7f9d42c37500] frame:3    pts:2351    pts_time:0.0533107


*** MIDDLE OMITTED FOR BREVITY ***


[Parsed_ametadata_1 @ 0x7f9d42c37500] lavfi.astats.Overall.RMS_level=-88.532185
[Parsed_ametadata_1 @ 0x7f9d42c37500] frame:375  pts:430895  pts_time:9.77086
[Parsed_ametadata_1 @ 0x7f9d42c37500] lavfi.astats.Overall.RMS_level=-88.594276
[Parsed_ametadata_1 @ 0x7f9d42c37500] frame:376  pts:432047  pts_time:9.79698
[Parsed_ametadata_1 @ 0x7f9d42c37500] lavfi.astats.Overall.RMS_level=-88.654138
size=N/A time=00:00:09.82 bitrate=N/A speed=82.6x    
video:0kB audio:1692kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_astats_0 @ 0x7f9d42c37280] Channel: 1
[Parsed_astats_0 @ 0x7f9d42c37280] DC offset: 0.000001
[Parsed_astats_0 @ 0x7f9d42c37280] Min level: -0.000106
[Parsed_astats_0 @ 0x7f9d42c37280] Max level: 0.000115
[Parsed_astats_0 @ 0x7f9d42c37280] Min difference: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Max difference: 0.000077
[Parsed_astats_0 @ 0x7f9d42c37280] Mean difference: 0.000017
[Parsed_astats_0 @ 0x7f9d42c37280] RMS difference: 0.000022
[Parsed_astats_0 @ 0x7f9d42c37280] Peak level dB: -78.752617
[Parsed_astats_0 @ 0x7f9d42c37280] RMS level dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] RMS peak dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] RMS trough dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] Crest factor: 3.126627
[Parsed_astats_0 @ 0x7f9d42c37280] Flat factor: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Peak count: 2
[Parsed_astats_0 @ 0x7f9d42c37280] Bit depth: 32/32
[Parsed_astats_0 @ 0x7f9d42c37280] Dynamic range: 76.274252
[Parsed_astats_0 @ 0x7f9d42c37280] Zero crossings: 246
[Parsed_astats_0 @ 0x7f9d42c37280] Zero crossings rate: 0.222624
[Parsed_astats_0 @ 0x7f9d42c37280] Number of NaNs: 0
[Parsed_astats_0 @ 0x7f9d42c37280] Number of Infs: 0
[Parsed_astats_0 @ 0x7f9d42c37280] Number of denormals: 0
[Parsed_astats_0 @ 0x7f9d42c37280] Channel: 2
[Parsed_astats_0 @ 0x7f9d42c37280] DC offset: 0.000001
[Parsed_astats_0 @ 0x7f9d42c37280] Min level: -0.000106
[Parsed_astats_0 @ 0x7f9d42c37280] Max level: 0.000115
[Parsed_astats_0 @ 0x7f9d42c37280] Min difference: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Max difference: 0.000077
[Parsed_astats_0 @ 0x7f9d42c37280] Mean difference: 0.000017
[Parsed_astats_0 @ 0x7f9d42c37280] RMS difference: 0.000022
[Parsed_astats_0 @ 0x7f9d42c37280] Peak level dB: -78.752617
[Parsed_astats_0 @ 0x7f9d42c37280] RMS level dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] RMS peak dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] RMS trough dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] Crest factor: 3.126627
[Parsed_astats_0 @ 0x7f9d42c37280] Flat factor: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Peak count: 2
[Parsed_astats_0 @ 0x7f9d42c37280] Bit depth: 32/32
[Parsed_astats_0 @ 0x7f9d42c37280] Dynamic range: 76.274252
[Parsed_astats_0 @ 0x7f9d42c37280] Zero crossings: 246
[Parsed_astats_0 @ 0x7f9d42c37280] Zero crossings rate: 0.222624
[Parsed_astats_0 @ 0x7f9d42c37280] Number of NaNs: 0
[Parsed_astats_0 @ 0x7f9d42c37280] Number of Infs: 0
[Parsed_astats_0 @ 0x7f9d42c37280] Number of denormals: 0
[Parsed_astats_0 @ 0x7f9d42c37280] Overall
[Parsed_astats_0 @ 0x7f9d42c37280] DC offset: 0.000001
[Parsed_astats_0 @ 0x7f9d42c37280] Min level: -0.000106
[Parsed_astats_0 @ 0x7f9d42c37280] Max level: 0.000115
[Parsed_astats_0 @ 0x7f9d42c37280] Min difference: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Max difference: 0.000077
[Parsed_astats_0 @ 0x7f9d42c37280] Mean difference: 0.000017
[Parsed_astats_0 @ 0x7f9d42c37280] RMS difference: 0.000022
[Parsed_astats_0 @ 0x7f9d42c37280] Peak level dB: -78.752617
[Parsed_astats_0 @ 0x7f9d42c37280] RMS level dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] RMS peak dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] RMS trough dB: -88.654138
[Parsed_astats_0 @ 0x7f9d42c37280] Flat factor: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Peak count: 2.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Bit depth: 32/32
[Parsed_astats_0 @ 0x7f9d42c37280] Number of samples: 1105
[Parsed_astats_0 @ 0x7f9d42c37280] Number of NaNs: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Number of Infs: 0.000000
[Parsed_astats_0 @ 0x7f9d42c37280] Number of denormals: 0.000000

Solution

  • ffmpeg logs all of its messages to stderr. By omitting the file option, the ametadata filter will send its output to the logger where it will be printed among other logs.

    instead, set file=- and capture stdout.