Search code examples
pythonaudioffmpegpcm

How can I convert audio to WAVE_FORMAT_PCM using FFmpeg?


I am using Python's wave module to read audio, and using FFmpeg to convert audio from other types to wav. However, I am encountering some problem.

I wrote v.py to generate an silence audio file a.wav

import sys, wave, math
import numpy as np

wave_data = np.zeros(44100).astype(np.short)

f = wave.open('a.wav', 'wb')
f.setnchannels(1)
f.setsampwidth(2)
f.setframerate(96000)
f.writeframes(wave_data.tostring())
f.close()

Then I used FFmpeg to "copy" a.wav to b.wav (though it seems to encode / decode the file), but I can only read a.wav with Python; b.wav cannot be opened.

[user@localhost tmp]$ ffmpeg -i a.wav b.wav
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'a.wav':
  Duration: 00:00:00.46, bitrate: 1536 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 96000 Hz, mono, s16, 1536 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'b.wav':
  Metadata:
    ISFT            : Lavf57.71.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 96000 Hz, mono, s16, 1536 kb/s
    Metadata:
      encoder         : Lavc57.89.100 pcm_s16le
size=      86kB time=00:00:00.45 bitrate=1537.8kbits/s speed= 706x    
video:0kB audio:86kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.115646%
[user@localhost tmp]$ python3
Python 3.6.4 (default, Jan 23 2018, 22:25:37) 
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import wave
>>> wave.open('a.wav')
<wave.Wave_read object at 0x7efea1c5e550>
>>> wave.open('b.wav')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/wave.py", line 499, in open
    return Wave_read(f)
  File "/usr/lib64/python3.6/wave.py", line 163, in __init__
    self.initfp(f)
  File "/usr/lib64/python3.6/wave.py", line 143, in initfp
    self._read_fmt_chunk(chunk)
  File "/usr/lib64/python3.6/wave.py", line 260, in _read_fmt_chunk
    raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 65534
>>> 

How should I change the command of FFmpeg to convert the file to WAVE_FORMAT_PCM, so that I can read b.wav with Python?


Solution

  • The issue is that Python's wave module doesn't support importing files with sampling rates greater than 48 kHz. The MP3 intermediation route works because ffmpeg, in this case, automatically downsamples inputs to 48 kHz. Reportedly, scipy can import 48+ kHz files.

    The syntax for manually downsampling to 48 kHz with ffmpeg is

    ffmpeg -i in -ar 48000 out.wav
    

    P.S. To skip decoding/encoding, use ffmpeg -i in.wav -c copy out.wav.