Search code examples
audioffmpegmp3

More precision from ffmpeg silencedetect


I am trying to split a very large (70 hours) mp3 file into smaller files. My first step is the get the timestamps using the silencedetect command in ffmpeg. It works fine for the first few timestamps, but unfortunately, the results are rounded to six significant digits.

The code I am executing is:

ffmpeg -i input.mp3 -af silencedetect=d=3 -hide_banner -nostats -f null -

My results are:

Input #0, mp3, from 'input.mp3':
  Duration: 70:46:05.32, start: 0.050113, bitrate: 64 kb/s
    Stream #0:0: Audio: mp3, 22050 Hz, stereo, fltp, 64 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le, 22050 Hz, stereo, s16, 705 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
[silencedetect @ 0x5590d08bd700] silence_start: 10.6895
[silencedetect @ 0x5590d08bd700] silence_end: 15.0054 | silence_duration: 4.31587
[silencedetect @ 0x5590d08bd700] silence_start: 446.958
[silencedetect @ 0x5590d08bd700] silence_end: 450.959 | silence_duration: 4.00168
[silencedetect @ 0x5590d08bd700] silence_start: 1168.17
[silencedetect @ 0x5590d08bd700] silence_end: 1172.17 | silence_duration: 4.00694
[silencedetect @ 0x5590d08bd700] silence_start: 1880.8
[silencedetect @ 0x5590d08bd700] silence_end: 1884.8 | silence_duration: 3.99265

...

[silencedetect @ 0x5590d08bd700] silence_start: 123108
[silencedetect @ 0x5590d08bd700] silence_end: 123111 | silence_duration: 3.61946
[silencedetect @ 0x5590d08bd700] silence_start: 123286
[silencedetect @ 0x5590d08bd700] silence_end: 123290 | silence_duration: 4.01646
[silencedetect @ 0x5590d08bd700] silence_start: 124229
[silencedetect @ 0x5590d08bd700] silence_end: 124233 | silence_duration: 4.01846
[silencedetect @ 0x5590d08bd700] silence_start: 124442
[silencedetect @ 0x5590d08bd700] silence_end: 124446 | silence_duration: 4.0298

...

Rounding to the nearest second is not sufficient for my purposes. Ideally, I would like each timestamp to be accurate to the hundredth of a second or something similar. Does anybody know a way to achieve this?


Solution

  • Append ametadata=print:file=- to the filterchain and parse stdout in your program. It provides the frame time in seconds, frames, and pts. Grab the time_base from ffprobe and you can compute accurate time.

    If you're using Python, you can try the following with my ffmpegio package:

    from ffmpegio import analyze as ffa, probe as ffp
    from pprint import pprint
    
    input = "BigBuckBunny.mp4"
    tb = next(info for info in ffp.streams_basic(input) 
              if info["codec_type"] == "audio")["time_base"]
    print(f'time_base = {tb} s')
    
    # analyze first 5 minutes and return silent intervals in the first 5 minutes
    (logger,) = ffa.run(input, ffa.SilenceDetect(d=1), time_units="pts", to=60 * 5)
    
    pprint([(pts0 * tb, pts1 * tb) for pts0, pts1 in logger.output.interval])
    
    

    returns the silent intervals in fractions

    time_base = 1/44100 s
    [(Fraction(947456, 11025), Fraction(958976, 11025)),
     (Fraction(976384, 11025), Fraction(39680, 441)),
     (Fraction(1018624, 11025), Fraction(146176, 1575))]