Search code examples
javaandroidreact-nativeaudiowebsocket

How to reduce audio data recorded with ENCODING_PCM_16BIT and sample rate 8000 Hz?


My goal is to reduce the audio data size generated by the react-native-recording.

The expected result: The audio file size should be between 5200 bytes to 5400 bytes per 1 seconds of audio / Opus Voice bit rate 5.17 KiB with sampling rate 8 kHz.

The actual result: The audio file size is 3 times the expected result. For example recording this word A guest for Mr. Jerry / 1.6 seconds of audio will have a data size roughly 28,000 bytes.

Important

I plan to write custom native module to achieve this goal. If you like, feel free to leave a link for me to read.

TL:DR

My end goal is to send the audio data through WebSocket. I deliberately remove the WebSocket.

Steps to reproduce:

  1. Add listener
  let audioInt16: number[] = [];
  let listener;

  React.useEffect(() => {
    // eslint-disable-next-line react-hooks/exhaustive-deps
    listener = Recording.addRecordingEventListener((data: number[]) => {
      console.log('record', data.length);
      // eslint-disable-next-line react-hooks/exhaustive-deps
      audioInt16 = audioInt16.concat(data);
    });

    return () => {
      listener.remove();
    };
  });
  1. Record audio
  const startAsync = async () => {
    await PermissionsAndroid.requestMultiple([
      PermissionsAndroid.PERMISSIONS.RECORD_AUDIO,
    ]);

    Recording.init({
      bufferSize: 4096,
      sampleRate: 8000,
      bitsPerChannel: 16,
      channelsPerFrame: 1,
    });

    Recording.start();
  };
  1. Save audio
  const saveAudio = async () => {
    const promise = await RNSaveAudio.saveWav(path, audioInt16);
    console.log('save audio', promise, path);
  };
  1. Play audio
  const playAudio = () => {
    if (player.canPrepare || player.canPlay) {
      player.prepare((err) => {
        if (err) {
          console.log(err);
        }
        console.log('play audio', player.duration);
        player.play();
      });
    }
  };

Solution

  • Update:

    I just did the calculations, with sampleRate of 8000, and mono channel with 16-bits (2-bytes) per sample, that's 16,0000 bytes per second, so 1.6 seconds of audio is 25,600 bytes, which is what you're getting.

    To get 5200 to 5400 bytes per second, you have to work backwards. Select 8-bit (1-byte) per sample. So you only have 5200 to 5400 sampleRate to work with.

    PCM data is raw audio samples. The only way to get higher quality audio into less bytes is by audio compression, into mp3 or aac or some other audio compression codec.


    Original reply:

    Looks like RNSaveAudio.saveWav has a fixed sampleRate of 44100:

    https://github.com/navrajkambo/RNSaveAudio/blob/master/android/src/main/java/com/navraj/rnsaveaudio/RNSaveAudioModule.java

                output = new DataOutputStream(new FileOutputStream(path));
                // WAVE header
                // see http://ccrma.stanford.edu/courses/422/projects/WaveFormat/
                writeString(output, "RIFF"); // chunk id
                writeInt(output, 36 + data.length); // chunk size
                writeString(output, "WAVE"); // format
                writeString(output, "fmt "); // subchunk 1 id
                writeInt(output, 16); // subchunk 1 size
                writeShort(output, (short) 1); // audio format (1 = PCM)
                writeShort(output, (short) 1); // number of channels
                writeInt(output, 44100); // sample rate
                writeInt(output, 44100 * 2); // byte rate
                writeShort(output, (short) 2); // block align
                writeShort(output, (short) 16); // bits per sample
                writeString(output, "data"); // subchunk 2 id
                writeInt(output, data.length); // subchunk 2 size
    

    You can simply write your own WAVE file writer based on this example More on Wave format: http://soundfile.sapp.org/doc/WaveFormat/