This project is done in Unreal Engine 5. I would prefer doing this in C++ instead of Blueprints, but I'm open to any ideas.
I am trying to take an audio file and create the data needed to make an audio waveform (sample image of audio waveform below).
I would go "frame by frame" (Sorry I don't know the audio equivalent to video) and try to find the information needed for this, such as the peaks and troughs of the audio file. For example:
frame 0: audio level 0
frame 1: audio level 1
frame 2: audio level 0
frame 3: audio level 2
frame 4: audio level 0
and using this data, a visual waveform can be created.
I have been looking at resources to get this but they all involve plugins that already do this, but I need to create this feature myself and I don't know where to start. If anyone has any sources or tips to get started that would be much appreciated, thank you.
These are the steps to generate a graphical waveform.
The conversion could simply be done like this
enum BitDepth { BIT_8, BIT_16, BIT_32 };
//BIT_8/BIT_16 are WAVE_FORMAT_PCM, BIT_32 is WAVE_FORMAT_IEEE_FLOAT
vector<float> * ConvertToFloat(const char *pSamples, int len, BitDepth bitDepth){
const unsigned char cByteMax = 255;
const unsigned char cByteHalf = 128;
const unsigned int cShortMax = 32768;
const short * pSigned16; //16 bit
const unsigned char * pUnsigned8;
const float * pFloat32;
vector<float> * pvConvertedSamples = new vector<float>(len);
switch (bitDepth) {
case BitDepth::BIT_8: //unsigned byte
pUnsigned8 = reinterpret_cast<const unsigned char*>(pSamples);
for (int i = 0; i < len; i++) {
pvConvertedSamples ->at(i) = pUnsigned8[i] == cByteMax ? 1.0f
: pUnsigned8[i] == cByteHalf ? 0.0f
: static_cast<float>(pUnsigned8[i] - cByteHalf)/cByteHalf;
}
break;
case BitDepth::BIT_16://signed short
pSigned16 = reinterpret_cast<const short*>(pSamples);
for (int i = 0; i < len; i++) {
pvConvertedSamples ->at(i) = static_cast<float>(pSigned16[i])/cShortMax;
}
break;
case BitDepth::BIT_32://floating point
pFloat32 = reinterpret_cast<const float*>(pSamples);
//just copy array to vector, no conversion is necessary
for (int i = 0; i < len; i++) {
pvConvertedSamples ->at(i) = pFloat32[i];
}
default:
break;
}
return pvConvertedSamples;
}
Before we start iterating through the samples, we need to know how many samples in total we have and how wide in pixels our generated waveform is going to be. This is important because this will allow us to compute a samples per waveform line metrics. For instance, if our file consists of 96000 samples and our generated waveform is 800 pixels wide, then 96000 / 800 = 120 samples per waveform line(let's call this X), which means that we have to process 120 samples for every waveform line (1 pixel wide). In general, this metrics won't be an integer so you have to be decide what to do in such a case. If X=120,5 we could decide to alternate between 120 and 121 samples for consecutive waveform lines (so X will vary between 120 and 121 samples). If we don't take this into consideration, we'll either run out of samples before we draw the entire length of our waveform or draw the entire waveform before processing all the samples. This process can't be 100% accurate but you have to come up with a solution that's good enough for you.
For every waveform line (1 pixel wide), iterate through X samples, and find the peak values among both positive and negative samples (2 peak values for every waveform line). The heignt of this waveform line is simply given as (waveform_height * postive_peak_value)/2 and (waveform_height * negative_peak_value)/2 respectively. This assumes that the samples are in [-1 +1] range (step #2).
Note that stereo files are interleaved, which means that a left channel sample is followed immediately by a right channel sample, followed by a left channel sample and so on.
Hope this help.