Search code examples
pythonaudionlpoperating-system

Using os.system() to convert audio files sample rate


I have started working on an NLP project, and at the start of this, I need to downsample the audio files. To do this I have found one script that can do it automatically, but though I can use it to downsample my audio I'm struggling to understand how it's working.

def convert_audio(audio_path, target_path, remove=False):
    """This function sets the audio `audio_path` to:
        - 16000Hz Sampling rate
        - one audio channel ( mono )
            Params:
                audio_path (str): the path of audio wav file you want to convert
                target_path (str): target path to save your new converted wav file
                remove (bool): whether to remove the old file after converting
        Note that this function requires ffmpeg installed in your system."""

    os.system(f"ffmpeg -i {audio_path} -ac 1 -ar 16000 {target_path}")
    # os.system(f"ffmpeg -i {audio_path} -ac 1 {target_path}")
    if remove:
        os.remove(audio_path)

this is the code that's giving my trouble, I don't understand how the 4th line from the bottom works, I believe that is the line that resamples the audio files.

The repo this is inside of : https://github.com/x4nth055/pythoncode-tutorials/

if anyone has information on how this is done I'd love to know, or if there are better ways to downsample audio files! Thanks


Solution

  • Have you ever used ffmpeg? the docs clearly show the options(maybe need audio expertise to understand)

    -ac[:stream_specifier] channels (input/output,per-stream) Set the number of audio channels. For output streams it is set by default to the number of input audio channels. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.

    -ar[:stream_specifier] freq (input/output,per-stream) Set the audio sampling frequency. For output streams it is set by default to the frequency of the corresponding input stream. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.

    Explanations for os.system

    Execute the command (a string) in a subshell...on Windows, the return value is that returned by the system shell after running command. The shell is given by the Windows environment variable COMSPEC: it is usually cmd.exe, which returns the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

    for better understanding, suggest print the command

    cmd_str = f"ffmpeg -i {audio_path} -ac 1 -ar 16000 {target_path}"
    print(cmd_str) # then you can copy paste to cmd/bash and run
    os.system(cmd_str)