There are a couple of posts that address the question, but I wasn't able to successfully playback a saved file. It usually plays back at half speed.
Convert 8kHz mulaw to 16KHz PCM in real time
Using the accepted answer in the above question, I saved the raw and base64 decoded the audio in go:
// Media event
type Media struct {
Track string `json:"track"`
Chunk string `json:"chunk"`
Timestamp string `json:"timestamp"`
Payload string `json:"payload"`
}
// SaveAudio will upgrade connection to websocket and save the audio to file
func SaveAudio(w http.ResponseWriter, r *http.Request) {
utility.DebugLogf("SaveAudio")
c, err := upgrader.Upgrade(w, r, nil)
if err != nil {
log.Print("upgrade:", err)
return
}
defer utility.SafeClose(c)
inBuf := bytes.Buffer{}
loop := true
for loop == true {
_, message, err := c.ReadMessage()
utility.PanicIfErr(err)
decMessage := TwilioWSSMessage{}
err = json.Unmarshal(message, &decMessage)
utility.PanicIfErr(err)
switch decMessage.Event {
case "connected":
utility.DebugLogf("Connected a %s protocol version:%s", decMessage.Protocol, decMessage.Version)
case "start":
utility.DebugLogf("Starting audio stream: %#v", decMessage.Start)
case "media":
chunk, err := base64.StdEncoding.DecodeString(decMessage.Media.Payload)
utility.PanicIfErr(err)
_, err = inBuf.Write(chunk)
utility.PanicIfErr(err)
case "stop":
utility.DebugLogf("Ending audio stream: %#v", decMessage.Stop)
loop = false
default:
utility.LogWarningf("Unrecognized event type: %s", decMessage.Event)
loop = false
}
}
saveRaw(&inBuf)
}
func saveRaw(buf *bytes.Buffer) {
rawOut, err := os.Create("out.ulaw")
utility.PanicIfErr(err)
_, err = rawOut.Write(buf.Bytes())
utility.PanicIfErr(err)
}
Then I used ffmpeg
to convert from mulaw
to the default pcm_s16le
:
ffmpeg -f mulaw -ar 8000 -ac 1 -i out.ulaw mulaw_decoded.wav
Then upsampled the audio from 8k->16k and play it with vlc:
ffmpeg -i mulaw_decoded.wav -ar 16000 upsampled.wav && vlc upsampled.wav
But it plays at half speed.
Ultimately I'd like to do it all in rust or go, but I can't even get it working locally with just ffmpeg.
Thanks in advance.
output of the above two ffmpeg
operations combined with the suggested sox resampler:
cmd:
ffmpeg -y -loglevel verbose -f mulaw -ar 8000 -ac 1 -bits_per_raw_sample 8 -i testsamples/raw_mulaw_bytes -af aresample=resampler=soxr -ar 16000 upsampled.wav
output:
[mulaw @ 0x7fecc0814000] Estimating duration from bitrate, this may be inaccurate
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, mulaw, from 'testsamples/raw_mulaw_bytes':
Duration: 00:00:20.74, bitrate: 64 kb/s
Stream #0:0: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_mulaw (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
[graph_0_in_0_0 @ 0x7fecc0505600] tb:1/8000 samplefmt:s16 samplerate:8000 chlayout:0x4
[Parsed_aresample_0 @ 0x7fecc0505280] ch:1 chl:mono fmt:s16 r:8000Hz -> ch:1 chl:mono fmt:s16 r:16000Hz
Output #0, wav, to 'upsampled.wav':
Metadata:
ISFT : Lavf58.29.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.54.100 pcm_s16le
No more output streams to write to, finishing.
size= 648kB time=00:00:20.74 bitrate= 256.0kbits/s speed=1.55e+03x
video:0kB audio:648kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.011753%
Input file #0 (testsamples/raw_mulaw_bytes):
Input stream #0:0 (audio): 519 packets read (165920 bytes); 519 frames decoded (165920 samples);
Total: 519 packets (165920 bytes) demuxed
Output file #0 (upsampled.wav):
Output stream #0:0 (audio): 200 frames encoded (331840 samples); 200 packets muxed (663680 bytes);
Total: 200 packets (663680 bytes) muxed
[AVIOContext @ 0x7fecc0433cc0] Statistics: 4 seeks, 6 writeouts
[AVIOContext @ 0x7fecc042a6c0] Statistics: 165920 bytes read, 0 seeks
The audio sounds the same as before
I finally figured it out thanks to this answer: Slow motion effect when decoding OPUS audio stream
where he mentions:
Another possible reason of "slow motion" is more than one stream decoded by the same decoder. But in this case you also get distorted slow audio.
so the tracks for this call are inbound
and outbound
so in case "media":
to save just one track:
if decMessage.Media.Track == "outbound" {
chunk, err := base64.StdEncoding.DecodeString(decMessage.Media.Payload)
utility.PanicIfErr(err)
_, err = outboundBuf.Write(chunk)
utility.PanicIfErr(err)
}
and the ffmpeg commands work as expected