Search code examples
python-3.xaudioffmpegwavogg

Python read sound file, ogg or wav?


I want to import music in Python, I am using soundfile. I noticed that importing ogg or wav files yield different results, as the following shows (the wav file is a conversion of the ogg file using ffmpeg). Using the code below, I observe a small difference between the ogg and wav files, is this difference normal ?

Edit : I used the following command to convert my ffmpeg -i filename.mp3 newfilename.wav

X, sample_rate= sf.read(wav_file)
print(wav_file)
print(X[0:20,])

And it outputs:

test_inputs/Shikantaza.wav
[[  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [ -3.05175781e-05  -3.05175781e-05]
 [ -3.05175781e-05   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00]]
test_inputs/Shikantaza.ogg
[[  1.17459308e-06   3.78499834e-07]
 [  5.19584228e-06   2.25495864e-06]
 [  1.13173719e-05   6.28675980e-06]
 [  1.07316619e-05   4.50928837e-06]
 [  2.70867986e-06  -3.40946622e-06]
 [  5.37277947e-06   5.06399772e-07]
 [  3.64179391e-06   6.27796169e-07]
 [ -5.09244865e-06  -6.14764804e-06]
 [ -4.38827237e-06  -3.74127058e-06]
 [ -5.41250847e-06  -3.70974522e-06]
 [ -2.75347884e-06  -7.08531957e-07]
 [ -9.67129495e-07   6.15705801e-07]
 [ -4.91217952e-06  -3.82820826e-06]
 [  4.38740926e-06   6.00675048e-06]
 [ -3.00040119e-06  -4.78463562e-08]
 [ -2.18559871e-05  -1.67418439e-05]
 [ -1.57035538e-05  -8.82137283e-06]
 [ -1.28820702e-05  -5.31934711e-06]
 [ -9.44996100e-06  -8.10974825e-07]
 [ -5.33486082e-06   3.71237797e-06]]

Solution

  • For the first file you are decoding to 16-bit linear PCM in WAV and then converting that to floating point. For the second file you are decoding to floating point directly. 16-bit linear PCM has less precision than floating point so that will lose information, although the loss would normally be negligible compared to the loss of the lossy compression, so could be ignored.

    Although WAV is most often used with 16-bit linear PCM it is also possible to store floating point PCM in a wav file (although the file will be about twice as large). To write floating point in wav:

    ffmpeg -i in.ogg -c:a pcm_f32le out.wav
    

    There could also be differences in the decoders for the lossy formats which produce slightly different results. Also if one of the decoders is not gapless it may only produce whole frames and may therefore have a few extra samples at the beginning and/or end.