webrtc audio-streaming simplewebrtc mediarecorder-api

Audio Playback Stuttering Issue in Web Application

I am developing a web application that streams audio from an admin to multiple users in real-time using WebRTC and Socket.IO. However, I am encountering an issue where the audio playback on the user side is stuttering or stopping intermittently. Here is an overview of the setup and the problem:

Setup: Admin captures audio from the microphone using getUserMedia and MediaRecorder. Audio is encoded from WAV to MP3 using lamejs and transmitted to users via Socket.IO. Users receive audio chunks and attempt to play them using the element.

Problem:

While the audio playback starts successfully, it often stutters or stops after a short period.

Admin side code :

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Admin Broadcasting</title>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/lamejs/1.2.0/lame.min.js"></script>
</head>
<body>
    <h1>Admin Broadcasting</h1>
    <button id="startBroadcastBtn">Start Broadcasting</button>
    
    <script src="/socket.io/socket.io.js"></script>
    <script>
        const startBroadcastBtn = document.getElementById('startBroadcastBtn');
        const socket = io('ws://localhost:9000');
        let mediaRecorder;

        startBroadcastBtn.addEventListener('click', async () => {
            try {
                const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
                if (!stream) {
                    console.error('getUserMedia failed to return a valid stream');
                    return;
                }

                // Initialize MediaRecorder with WAV format
                mediaRecorder = new MediaRecorder(stream);

                // Create WAV to MP3 converter using lamejs
                const mp3Encoder = new lamejs.Mp3Encoder(1, 44100, 128);

                // Add event listener for data availability
                mediaRecorder.ondataavailable = async (event) => {
                    try {
                        if (event.data.size > 0) {
                            console.log('Audio chunk captured:', event.data);

                            // Convert WAV audio data to MP3
                            const reader = new FileReader();
                            reader.onload = async () => {
                                const arrayBuffer = reader.result;
                                const buffer = new Int8Array(arrayBuffer);
                                const length = buffer.length;
                                const pcm = new Int16Array(length / 2);
                                
                                // Convert Int8Array to Int16Array (PCM data)
                                for (let i = 0; i < length; i += 2) {
                                    pcm[i / 2] = buffer[i] | (buffer[i + 1] << 8);
                                }

                                const mp3Data = mp3Encoder.encodeBuffer(pcm);
                                const mp3Blob = new Blob([mp3Data], { type: 'audio/mp3' });
                                console.log('Created MP3 blob:', mp3Blob);
                                socket.emit('admin-audio-chunk', mp3Blob);
                            };
                            reader.readAsArrayBuffer(event.data);
                        }
                    } catch (error) {
                        console.error('Error processing audio data:', error);
                    }
                };

                // Start recording
                console.log('Starting recording...');
                mediaRecorder.start(1000);
                console.log('Recording started.');
            } catch (error) {
                console.error('Error accessing media devices:', error);
            }
        });
    </script>
</body>
</html>

user side code :

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>User Listening</title>
</head>
<body>
    <h1>User Listening</h1>
    <audio id="audioPlayer" controls preload="auto"></audio>

    <script src="/socket.io/socket.io.js"></script>
    <script>
        const audioPlayer = document.getElementById('audioPlayer');
        const socket = io('ws://localhost:9000');
        let audioChunks = [];

        socket.on('user-audio-chunk', (data) => {
            console.log('Received audio chunk from admin:', data);
            audioChunks.push(data);
            playAudioChunks();
        });

        function playAudioChunks() {
            if (audioChunks.length > 0 && audioPlayer.paused) {
                const audioBlob = new Blob(audioChunks, { type: 'audio/mp3' });
                audioPlayer.src = URL.createObjectURL(audioBlob);
                audioPlayer.play().catch((error) => {
                    console.error('Error playing audio:', error);
                });
                audioChunks = [];
            }
        }
    </script>
</body>
</html>

How to play audio smoothly and resolve the issue?

Solution

Setup: Admin captures audio from the microphone using getUserMedia and MediaRecorder. Audio is encoded from WAV to MP3 using lamejs and transmitted to users via Socket.IO. Users receive audio chunks and attempt to play them using the element.

You're encoding chunks and playing back chunks... so this is why you have chunky sounding audio. If you want a stream of audio, you need to treat it like a stream.

Don't instantiate a new codec every time. I don't know about your build of LAME, but regular LAME will make use of the bit reservoir which will carry over data from one frame to the other. So, you need to set up one instance and keep pushing PCM data on it.

Now, on the playback side, don't use Web Sockets and all that at all. It's simpler than that. Use a simple <audio> element and stream the data over HTTP:

<audio src="https://stream.example.com/something"></audio>

Your server should take the chunks it gets and immediately write them to the client. If instead you try to decode each segment individual, you're just going to get chunks decoded and they won't magically just play in-time and line up.