Search code examples
ffmpegtwilioresampling

Save a Twilio <Stream> of format 8Khz `mulaw` to a file


There are a couple of posts that address the question, but I wasn't able to successfully playback a saved file. It usually plays back at half speed.

Convert 8kHz mulaw to 16KHz PCM in real time

Using the accepted answer in the above question, I saved the raw and base64 decoded the audio in go:

// Media event
type Media struct {
    Track     string `json:"track"`
    Chunk     string `json:"chunk"`
    Timestamp string `json:"timestamp"`
    Payload   string `json:"payload"`
}

// SaveAudio will upgrade connection to websocket and save the audio to file
func SaveAudio(w http.ResponseWriter, r *http.Request) {
    utility.DebugLogf("SaveAudio")
    c, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        log.Print("upgrade:", err)
        return
    }

    defer utility.SafeClose(c)
    inBuf := bytes.Buffer{}

    loop := true
    for loop == true {
        _, message, err := c.ReadMessage()
        utility.PanicIfErr(err)
        decMessage := TwilioWSSMessage{}
        err = json.Unmarshal(message, &decMessage)
        utility.PanicIfErr(err)

        switch decMessage.Event {
        case "connected":
            utility.DebugLogf("Connected a %s protocol version:%s", decMessage.Protocol, decMessage.Version)
        case "start":
            utility.DebugLogf("Starting audio stream: %#v", decMessage.Start)
        case "media":
            chunk, err := base64.StdEncoding.DecodeString(decMessage.Media.Payload)
            utility.PanicIfErr(err)
            _, err = inBuf.Write(chunk)
            utility.PanicIfErr(err)
        case "stop":
            utility.DebugLogf("Ending audio stream: %#v", decMessage.Stop)
            loop = false
        default:
            utility.LogWarningf("Unrecognized event type: %s", decMessage.Event)
            loop = false
        }
    }

    saveRaw(&inBuf)
}

func saveRaw(buf *bytes.Buffer) {
    rawOut, err := os.Create("out.ulaw")
    utility.PanicIfErr(err)

    _, err = rawOut.Write(buf.Bytes())
    utility.PanicIfErr(err)
}

Then I used ffmpeg to convert from mulaw to the default pcm_s16le:

ffmpeg -f mulaw -ar 8000 -ac 1 -i out.ulaw mulaw_decoded.wav 

Then upsampled the audio from 8k->16k and play it with vlc:

ffmpeg -i mulaw_decoded.wav -ar 16000 upsampled.wav && vlc upsampled.wav

But it plays at half speed.

Ultimately I'd like to do it all in rust or go, but I can't even get it working locally with just ffmpeg.

Thanks in advance.


output of the above two ffmpeg operations combined with the suggested sox resampler:

cmd:

ffmpeg -y -loglevel verbose -f mulaw -ar 8000 -ac 1 -bits_per_raw_sample 8 -i testsamples/raw_mulaw_bytes -af aresample=resampler=soxr -ar 16000 upsampled.wav

output:

[mulaw @ 0x7fecc0814000] Estimating duration from bitrate, this may be inaccurate
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, mulaw, from 'testsamples/raw_mulaw_bytes':
  Duration: 00:00:20.74, bitrate: 64 kb/s
    Stream #0:0: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_mulaw (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
[graph_0_in_0_0 @ 0x7fecc0505600] tb:1/8000 samplefmt:s16 samplerate:8000 chlayout:0x4
[Parsed_aresample_0 @ 0x7fecc0505280] ch:1 chl:mono fmt:s16 r:8000Hz -> ch:1 chl:mono fmt:s16 r:16000Hz
Output #0, wav, to 'upsampled.wav':
  Metadata:
    ISFT            : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
No more output streams to write to, finishing.
size=     648kB time=00:00:20.74 bitrate= 256.0kbits/s speed=1.55e+03x
video:0kB audio:648kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.011753%
Input file #0 (testsamples/raw_mulaw_bytes):
  Input stream #0:0 (audio): 519 packets read (165920 bytes); 519 frames decoded (165920 samples);
  Total: 519 packets (165920 bytes) demuxed
Output file #0 (upsampled.wav):
  Output stream #0:0 (audio): 200 frames encoded (331840 samples); 200 packets muxed (663680 bytes);
  Total: 200 packets (663680 bytes) muxed
[AVIOContext @ 0x7fecc0433cc0] Statistics: 4 seeks, 6 writeouts
[AVIOContext @ 0x7fecc042a6c0] Statistics: 165920 bytes read, 0 seeks

The audio sounds the same as before


Solution

  • I finally figured it out thanks to this answer: Slow motion effect when decoding OPUS audio stream

    where he mentions:

    Another possible reason of "slow motion" is more than one stream decoded by the same decoder. But in this case you also get distorted slow audio.

    so the tracks for this call are inbound and outbound so in case "media": to save just one track:

    if decMessage.Media.Track == "outbound" {
      chunk, err := base64.StdEncoding.DecodeString(decMessage.Media.Payload)
      utility.PanicIfErr(err)
      _, err = outboundBuf.Write(chunk)
      utility.PanicIfErr(err)
    }
    

    and the ffmpeg commands work as expected