java android react-native audio websocket

How to reduce audio data recorded with ENCODING_PCM_16BIT and sample rate 8000 Hz?

My goal is to reduce the audio data size generated by the react-native-recording.

The expected result: The audio file size should be between 5200 bytes to 5400 bytes per 1 seconds of audio / Opus Voice bit rate 5.17 KiB with sampling rate 8 kHz.

The actual result: The audio file size is 3 times the expected result. For example recording this word A guest for Mr. Jerry / 1.6 seconds of audio will have a data size roughly 28,000 bytes.

Important

I plan to write custom native module to achieve this goal. If you like, feel free to leave a link for me to read.

TL:DR

My end goal is to send the audio data through WebSocket. I deliberately remove the WebSocket.

Steps to reproduce:

Add listener

  let audioInt16: number[] = [];
  let listener;

  React.useEffect(() => {
    // eslint-disable-next-line react-hooks/exhaustive-deps
    listener = Recording.addRecordingEventListener((data: number[]) => {
      console.log('record', data.length);
      // eslint-disable-next-line react-hooks/exhaustive-deps
      audioInt16 = audioInt16.concat(data);
    });

    return () => {
      listener.remove();
    };
  });

Record audio

  const startAsync = async () => {
    await PermissionsAndroid.requestMultiple([
      PermissionsAndroid.PERMISSIONS.RECORD_AUDIO,
    ]);

    Recording.init({
      bufferSize: 4096,
      sampleRate: 8000,
      bitsPerChannel: 16,
      channelsPerFrame: 1,
    });

    Recording.start();
  };

Save audio

  const saveAudio = async () => {
    const promise = await RNSaveAudio.saveWav(path, audioInt16);
    console.log('save audio', promise, path);
  };

Play audio

  const playAudio = () => {
    if (player.canPrepare || player.canPlay) {
      player.prepare((err) => {
        if (err) {
          console.log(err);
        }
        console.log('play audio', player.duration);
        player.play();
      });
    }
  };

Solution

Update:

I just did the calculations, with sampleRate of 8000, and mono channel with 16-bits (2-bytes) per sample, that's 16,0000 bytes per second, so 1.6 seconds of audio is 25,600 bytes, which is what you're getting.

To get 5200 to 5400 bytes per second, you have to work backwards. Select 8-bit (1-byte) per sample. So you only have 5200 to 5400 sampleRate to work with.

PCM data is raw audio samples. The only way to get higher quality audio into less bytes is by audio compression, into mp3 or aac or some other audio compression codec.

Original reply:

Looks like RNSaveAudio.saveWav has a fixed sampleRate of 44100:

https://github.com/navrajkambo/RNSaveAudio/blob/master/android/src/main/java/com/navraj/rnsaveaudio/RNSaveAudioModule.java

            output = new DataOutputStream(new FileOutputStream(path));
            // WAVE header
            // see http://ccrma.stanford.edu/courses/422/projects/WaveFormat/
            writeString(output, "RIFF"); // chunk id
            writeInt(output, 36 + data.length); // chunk size
            writeString(output, "WAVE"); // format
            writeString(output, "fmt "); // subchunk 1 id
            writeInt(output, 16); // subchunk 1 size
            writeShort(output, (short) 1); // audio format (1 = PCM)
            writeShort(output, (short) 1); // number of channels
            writeInt(output, 44100); // sample rate
            writeInt(output, 44100 * 2); // byte rate
            writeShort(output, (short) 2); // block align
            writeShort(output, (short) 16); // bits per sample
            writeString(output, "data"); // subchunk 2 id
            writeInt(output, data.length); // subchunk 2 size

You can simply write your own WAVE file writer based on this example More on Wave format: http://soundfile.sapp.org/doc/WaveFormat/