I understand that PCM data is stored as [left][right][left][right]...
. Am trying to convert a stereo PCM to mono Vorbis (*.ogg) which I understand is achievable by halving the left and the right channels ((left+right)*0.5). I have actually achieved this by amending the encoder example in the libvorbis sdk like this,
#define READ 1024
signed char readbuffer[READ*4];
and the PCM data is read thus
fread(readbuffer, 1, READ*4, stdin)
I then halved the two channels,
buffer[0][i] = ((((readbuffer[i*4+1]<<8) | (0x00ff&(int)readbuffer[i*4]))/32768.f) + (((readbuffer[i*4+3]<<8) | (0x00ff&(int)readbuffer[i*4+2]))/32768.f)) * 0.5f;
It worked perfectly, but, I don't understand how they deinterleave the left and right channel from the PCM data (i.e. all the bit shifting and "ANDing" and "ORing").
A .wav file typically stores its PCM data in little endian format, with 16 bits per sample per channel. For the usual signed 16-bit PCM file, this means that the data is physically stored as
[LEFT LSB] [LEFT MSB] [RIGHT LSB] [RIGHT MSB] ...
so that every group of 4 bytes makes up a single stereo PCM sample. Hence, you can find sample i
by looking at bytes 4*i
through 4*i+3
, inclusive.
To decode a single 16-bit value from two bytes, you do this:
(MSB << 8) | LSB
Because your read buffer values are stored as signed chars, you have to be a bit careful because both MSB
and LSB
will be sign-extended. This is undesirable for the LSB; therefore, the code uses
0xff & (int)LSB
to obtain the unsigned version of the low byte (technically, this works by upcasting to an int, and selecting the low 8 bits; an alternate formulation would be to just write (uint8_t)LSB
).
Note that the MSBs are at indices 1 and 3, and the LSBs are at indices 0 and 2. So,
((readbuffer[i*4+1]<<8) | (0x00ff&(int)readbuffer[i*4]))
and
((readbuffer[i*4+3]<<8) | (0x00ff&(int)readbuffer[i*4+2]))
are just obtaining the values of the left and right channels as 16-bit signed values by using some bit manipulation to assemble the bytes into numbers.
Then, each of these values is divided by 32768.0. Note that a signed 16-bit value has a range of [-32768, 32767]
. Thus, dividing by 32768 gives a range of approximately [-1, 1]. The two divided values are added to give a number in the range [-2, 2], and then the whole thing is multiplied by 0.5 to obtain the average (a floating-point value in the range [-1, 1]).