Search code examples
androidaudio

Get average amplitude of an audio file as it is being played


I am trying to build an android app that plays audio and shows the current time + average amplitude as sliders. Playing the audio and showing the current time was pretty easy using MediaPlayer.

But I am not able to get the correct average amplitude of the audio being played.

This is the functional part of my code -

  val mediaPlayerMap: MutableMap<String, MediaPlayer> = mutableMapOf()
  val visualizerMap: MutableMap<String, Visualizer> = mutableMapOf()

  private fun createDataCaptureListener(visualizer: Visualizer): Visualizer.OnDataCaptureListener {
    return object : Visualizer.OnDataCaptureListener {

      override fun onWaveFormDataCapture(
          visualizer: Visualizer,
          waveFormData: ByteArray,
          samplingRateMilliHerz: Int
      ) {
        // Wrap the ByteArray in a ByteBuffer
        val buffer = ByteBuffer.wrap(waveFormData)
        var maxAmplitude = 0.0f
        var sum = 0.0f
        var count = 0
        while (buffer.hasRemaining()) {
          val amplitude = buffer.getShort()
          sum += kotlin.math.abs(amplitude.toInt())
          count++
          if (amplitude > maxAmplitude) {
            maxAmplitude = amplitude.toFloat()
          }
        }
        var averageAmplitude = 0.0f
        if (count > 0) {
          averageAmplitude = sum / count
        }

        // Normalize amplitudes
        val normalizedMaxAmplitude = maxAmplitude / 32768.0f
        val normalizedAverageAmplitude = averageAmplitude / 32768.0f
        Log.e(
            "Visualizer",
            "maxAmplitude: $normalizedMaxAmplitude, averageAmplitude: $normalizedAverageAmplitude")
      }

      override fun onFftDataCapture(
          visualizer: Visualizer,
          fftData: ByteArray,
          samplingRateMilliHerz: Int
      ) {
        // Do nothing for now
      }
    }
  }

  fun updateAudioSettings(fileUrl: String, play: Boolean): Float {
    var mediaPlayer = mediaPlayerMap[fileUrl]
    if (mediaPlayer == null) {
      mediaPlayer =
          MediaPlayer().apply {
            setDataSource(fileUrl)
            prepare()
          }

      mediaPlayerMap[fileUrl] = mediaPlayer
    }

    var visualizer = visualizerMap[fileUrl]

    if (play) {
      if (!mediaPlayer.isPlaying) {
        mediaPlayer.start()
      }

      if (visualizer == null || visualizer?.enabled == false) {
        visualizer?.release()
        val visualizer = Visualizer(mediaPlayer.audioSessionId)
        val rate = Visualizer.getMaxCaptureRate()
        visualizer.setDataCaptureListener(
            createDataCaptureListener(visualizer),
            rate,
            true, // wave form requested
            true, // fft requested
        )
        visualizer.enabled = true
        visualizerMap[fileUrl] = visualizer
      }
    } else {
      if (mediaPlayer.isPlaying) {
        mediaPlayer.pause()
      }
      visualizer?.release()
      visualizerMap.remove(fileUrl)
    }

    return mediaPlayer.currentPosition / 1000.0f
  }

I have also tried using RMS.

This works in the sense that it returns numbers. But the numbers are not correct. For an audio file (that uses PCM 16 encoding) it returns values in the range of 0.5-0.75. But when I check the same file in https://www.maztr.com/audiofileanalyzer I get an average amplitude of 0.000007 and maximum amplitude: 0.019836.

I am open to using ExoPlayer if it handles this better. Ideally I would like to have a way of calculating the average amplitude without having to worry about the encoding type of the audio file. I am looking for an equivalent of https://developer.apple.com/documentation/audiotoolbox/kmultichannelmixerparam_postaveragepower in android.

Is that possible?


Solution

  • The Visualizer provides low quality audio content. The Waveform data are not a great source for analyzing the audio. They are used for simple visualization.

    The Waveform data byte array contains 8-bit (unsigned) mono samples. But you are treating them as if they were 16-bit signed short integers. This is why the calculation is incorrect.

    Also, mean amplitude is not a very useful figure for audio (it doesn't reflect perceived loudness). Consider using Root mean square (RMS) amplitude instead. Note that you can get pairs of peak and RMS values [*] using the getMeasurementPeakRms() method.

    [*] The values are given in millibels and can be converted to linear scale if needed.