Search code examples
c++audiounreal-blueprintunreal-engine5

Analyzing audio in C++ / Unreal Engine 5 without plugins


This project is done in Unreal Engine 5. I would prefer doing this in C++ instead of Blueprints, but I'm open to any ideas.

I am trying to take an audio file and create the data needed to make an audio waveform (sample image of audio waveform below).

enter image description here

I would go "frame by frame" (Sorry I don't know the audio equivalent to video) and try to find the information needed for this, such as the peaks and troughs of the audio file. For example:

frame 0: audio level 0
frame 1: audio level 1
frame 2: audio level 0
frame 3: audio level 2
frame 4: audio level 0

and using this data, a visual waveform can be created.

I have been looking at resources to get this but they all involve plugins that already do this, but I need to create this feature myself and I don't know where to start. If anyone has any sources or tips to get started that would be much appreciated, thank you.


Solution

  • These are the steps to generate a graphical waveform.

    1. Load a wave file (*.wav) into memory and parse the wave header in order to retrieve the format and the actual audio samples. I suggest using a library to do this. There are quite a few of them ( e.g http://www.mega-nerd.com/libsndfile/ or http://www.portaudio.com/ )
    • the wave format will typically be either WAVE_FORMAT_PCM or WAVE_FORMAT_IEEE_FLOAT. The former can either be encoded as 8 bit unsigned samples (range [0-255]) or 16 bit signed samples (range [-32768 to 32767] ). The latter is 32-bit floating point values (range [-1 to 1])
    1. While not necessary, I suggest that you convert WAVE_FORMAT_PCM samples to floating point values (WAVE_FORMAT_IEEE_FLOAT) as that would allow easier processing later.

    The conversion could simply be done like this

     enum  BitDepth { BIT_8, BIT_16, BIT_32 };
      
      //BIT_8/BIT_16 are WAVE_FORMAT_PCM,  BIT_32 is WAVE_FORMAT_IEEE_FLOAT
      
      vector<float> * ConvertToFloat(const char *pSamples, int len, BitDepth bitDepth){
    
        const unsigned char cByteMax  =   255;
        const unsigned char cByteHalf =   128;
        const unsigned int  cShortMax = 32768;
    
        const short * pSigned16;       //16 bit
        const unsigned char * pUnsigned8;
        const float * pFloat32;
        vector<float> * pvConvertedSamples = new vector<float>(len);
        
        switch (bitDepth) {
                case BitDepth::BIT_8:   //unsigned byte
                   pUnsigned8 = reinterpret_cast<const unsigned char*>(pSamples);
                   for (int i = 0; i < len; i++) {
                   pvConvertedSamples ->at(i) = pUnsigned8[i] == cByteMax  ? 1.0f
                                 : pUnsigned8[i] == cByteHalf ? 0.0f
                                 : static_cast<float>(pUnsigned8[i] - cByteHalf)/cByteHalf;
                   }
                break;
                case BitDepth::BIT_16://signed short
                   pSigned16 = reinterpret_cast<const short*>(pSamples);
                   for (int i = 0; i < len; i++) {
                       pvConvertedSamples ->at(i) = static_cast<float>(pSigned16[i])/cShortMax;
                   }
                break;
                case BitDepth::BIT_32://floating point
                    pFloat32 = reinterpret_cast<const float*>(pSamples);
                   //just copy array to vector, no conversion is necessary
                    for (int i = 0; i < len; i++) {
    
                        pvConvertedSamples ->at(i) = pFloat32[i];
                }
                default:
             break;
             }
      
         return pvConvertedSamples;
        
    }
    
    1. Before we start iterating through the samples, we need to know how many samples in total we have and how wide in pixels our generated waveform is going to be. This is important because this will allow us to compute a samples per waveform line metrics. For instance, if our file consists of 96000 samples and our generated waveform is 800 pixels wide, then 96000 / 800 = 120 samples per waveform line(let's call this X), which means that we have to process 120 samples for every waveform line (1 pixel wide). In general, this metrics won't be an integer so you have to be decide what to do in such a case. If X=120,5 we could decide to alternate between 120 and 121 samples for consecutive waveform lines (so X will vary between 120 and 121 samples). If we don't take this into consideration, we'll either run out of samples before we draw the entire length of our waveform or draw the entire waveform before processing all the samples. This process can't be 100% accurate but you have to come up with a solution that's good enough for you.

    2. For every waveform line (1 pixel wide), iterate through X samples, and find the peak values among both positive and negative samples (2 peak values for every waveform line). The heignt of this waveform line is simply given as (waveform_height * postive_peak_value)/2 and (waveform_height * negative_peak_value)/2 respectively. This assumes that the samples are in [-1 +1] range (step #2).

    Note that stereo files are interleaved, which means that a left channel sample is followed immediately by a right channel sample, followed by a left channel sample and so on.

    Hope this help.