Search code examples
pythonmp3ffprobesample-rateffmpegio

Why the number of samples given by `ffmpegio` differs using 2 methods?


I have an mp3 file at a sample rate value of 44100, let's name it a.mp3.

Using python library ffmpegio with the following code, i get a total amount of sample equal to 290704.

with ffmpegio.open(file, 'ra', blocksize = 16, sample_fmt = 'dbl') as file_opened:

        for i, indata in enumerate(file_opened):

            do some stuff
    print((i + 1) * 16)

But,

ffmpegio.probe.audio_streams_basic('a.mp3')[0]['nb_samples']

gives me 292608

How the difference between 292608 and 290704 could be explained?

I search in ffmpegio documentation, but found nothing that trigered my attention:

https://python-ffmpegio.github.io/python-ffmpegio/

Thanks.

I tried to get more information about how the total number of samples was calculated in ffmpegio.probe.audio_streams_basic()

Also, the difference does not seem to increase with increasing the length of the mp3 file.


Solution

  • (I'm the ffmpegio dev)

    As far as I can tell, the most likely suspect is that audio_streams_basic() reporting incorrect nb_samples. audio_streams_basic computes nb_samples from ffprobe's outputs if ffprobe does not return nb_samples. It's calculation is rudimentary atm, only based on the duration timestamp, possibly ignoring start time and other timings.

    Try

    from pprint import pprint
    
    pprint(ffmpegio.probe.full_details(file,select_streams=0))
    

    and see if you see nb_samples in the stream info. If it's not included, please report the issue on GitHub, with cut-n-pasted stream info.

    Also, you can try

    fs, x = ffmpegio.audio.read(file)
    print(x.shape)
    

    to make sure the sample count matches that of the stream reader.

    Finally, the read stream is guaranteed to return blocksize samples at a time except for the last block to address the posted comments.