Search code examples
pythonffmpeg

Ffmpeg - transform mulaw 8000khz audio buffer data into valid bytes format


I'm trying to read a bytes variable using ffmpeg, but the audio stream I listen to, sends me buffer data in mulaw encoded buffer like this:

https://github.com/boblp/mulaw_buffer_data/blob/main/buffer_data

I'm having trouble running the ffmpeg_read function from the transformers library found here:

def ffmpeg_read(bpayload: bytes, sampling_rate: int) -> np.array:
"""
Helper function to read an audio file through ffmpeg.
"""
ar = f"{sampling_rate}"
ac = "1"
format_for_conversion = "f32le"
ffmpeg_command = [
    "ffmpeg",
    "-i",
    "pipe:0",
    "-ac",
    ac,
    "-ar",
    ar,
    "-f",
    format_for_conversion,
    "-hide_banner",
    "-loglevel",
    "quiet",
    "pipe:1",
]

try:
    with subprocess.Popen(ffmpeg_command, stdin=subprocess.PIPE, stdout=subprocess.PIPE) as ffmpeg_process:
        output_stream = ffmpeg_process.communicate(bpayload)
except FileNotFoundError as error:
    raise ValueError("ffmpeg was not found but is required to load audio files from filename") from error
out_bytes = output_stream[0]
audio = np.frombuffer(out_bytes, np.float32)
if audio.shape[0] == 0:
    raise ValueError(
        "Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has "
        "a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote "
        "URL, ensure that the URL is the full address to **download** the audio file."
    )
return audio

But everytime I get:

raise ValueError(
    "Soundfile is either not in the correct format or is malformed. Ensure that the soundfile has "
    "a valid audio file extension (e.g. wav, flac or mp3) and is not corrupted. If reading from a remote "
    "URL, ensure that the URL is the full address to **download** the audio file."
)

If I grab any wav file I can do something like this:

import wave

with open('./emma.wav', 'rb') as fd:
    contents = fd.read()
    print(contents)

And running it through the function does work!

So my question would be:

How can I transform my mulaw encoded buffer data into a valid bytes format that works with ffmpeg_read()?

EDIT: I've found a way using pywav (https://pypi.org/project/pywav/)

# 1 stands for mono channel, 8000 sample rate, 8 bit, 7 stands 
for MULAW encoding
wave_write = pywav.WavWrite("filename.wav", 1, 8000, 8, 7)
wave_write.write(mu_encoded_data)

wave_write.close()

This is the result: https://github.com/boblp/mulaw_buffer_data/blob/main/filename.wav

the background noise is acceptable.

However, I want to use a FFMPEG instead to avoid creating a tmp file.


Solution

  • This worked for me:

    import subprocess
    import numpy as np
    import io
    
    def ffmpeg_read_mulaw(bpayload: bytes, sampling_rate: int) -> np.array:
        ar = f"{sampling_rate}"
        ac = "1"
        format_for_conversion = "f32le"
        ffmpeg_command = [
            "ffmpeg",
            "-f",
            "mulaw",
            "-ar",
            ar,
            "-ac",
            ac,
            "-i",
            "pipe:0",
            "-b:a",#change the bitrate
            "256k", #change the bitrate to 256k
            "-f",
            format_for_conversion,
            "-hide_banner",
            "-loglevel",
            "quiet",
            "pipe:1",
        ]
    
        try:
            with subprocess.Popen(ffmpeg_command, stdin=subprocess.PIPE, stdout=subprocess.PIPE) as ffmpeg_process:
                output_stream = ffmpeg_process.communicate(bpayload)
        except FileNotFoundError as error:
            raise ValueError("ffmpeg was not found but is required to load audio files from filename") from error
    
        out_bytes = output_stream[0]
        audio = np.frombuffer(out_bytes, np.float32)
    
        if audio.shape[0] == 0:
            raise ValueError("Failed to decode mu-law encoded data with FFMPEG.")
    
        return audio
    
    # Example usage:
    # mu_encoded_data is your mu-law encoded buffer data
    mu_encoded_data = b"\x7F\xFF\x80\x01\x7F\xFF"
    sampling_rate = 8000
    decoded_audio = ffmpeg_read_mulaw(mu_encoded_data, sampling_rate)
    print(decoded_audio)