Search code examples
c#fileaudiowav

What is actually contained in data chunk in wav file?


For example take the case of a stereo channel wav file with sample rate as 44100 and a bit depth of 16 bits.

Exactly how is the 16 bits divided up?


The audio clip that I was using, the first 4 bytes had data about the first audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).


The next 4 bytes had data about the second audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).

So I would like to figure out what those 4 bits are.


Solution

  • WAV format audio file starts with a 44 byte header followed by the payload which is the uncompressed raw PCM audio data ... in the payload area as you walk across the PCM data each sample (point on audio curve) will contain data for all channels ... header will tell you number of channels ... for stereo using bit depth of 16 you will see two bytes (16 bits == bit depth) for a given channel immediately followed by the two bytes of the next channel etc...

    For a given channel a given set of bytes (2 bytes in your case) will appear in two possible layouts determined by choice of endianness ... 1st byte followed by 2nd byte ... ordering of endianness is important here ... header also tells you what endianness you are using ... typically WAV format is little endian

    each channel will generate its own audio curve

    in your code to convert from PCM data into a usable audio curve data point you must combine all bytes of a given sample for given channel into a single value ... typically its integer and not floating point again the header defines which ... if integer it could be signed or unsigned ... little endian means as you read the file the first (left most) byte will become the least significant byte followed by each subsequent byte which becomes the next most significant byte

    in pseudo code :

    int mydatapoint  // allocate your integer audio curve data point
    

    step 0

    mydatapoint = most-significant-byte
    

    stop here for bit depth of 8

    ... if you have bit depth greater than 8 bits now left shift this to make room for the following byte if any

    step 1

    mydatapoint = mydatapoint << 8 // shove data to the left by 8 bits
                                   // which effectively jacks up its value
                                   // and leaves empty those right most 8 bits
    

    step 2

    // following operation is a bit wise OR operation
    mydatapoint = mydatapoint  OR next-most-significant-byte
    

    now repeat doing steps 1 & 2 for each subsequent next byte of PCM data in order from most significant to least significant (for little endian) ... essential for any bit depth beyond 16 so for 24 bit audio or 32 bit you will need to combine 3 or 4 bytes of PCM data into your single integer output audio curve data point

    Why are we doing this bit shifting nonsense

    The level of audio fidelity when converting from analog to digital is driven by how accurately are you recording the audio curve ... analog audio is a continuous curve however to become digital it must be sampled into discrete points along the curve ... two factors determine the fidelity when sampling the analog curve to create its digital representation ... the left to right distance along the analog audio curve is determined by sample rate and the up and down distance along the audio curve is determined by bit depth ... higher sample rate gives you more samples per second and a greater bit depth gives you more vertical points to approximate the instantaneous height of the analog audio curve

    bit depth  8 == 2^8  ==   256 distinct vertical values to record curve height
    bit depth 16 == 2^16 == 65536 distinct vertical values to record curve height
    

    so to more accurately record into digital the height of our analog audio curve we want to become as granular as possible ... so the resultant audio curve is as smooth as possible and not jagged which would happen if we only allocated 2 bits which would give us 2^2 which is 4 distinct values ... try to connect the dots when your audio curve only has 4 vertical values to choose from on your plot ... the bit shifting is simply building up a single integer value from many bytes of data ... numbers greater than 256 cannot fit into one byte and so must be spread across multiple bytes of PCM data

    https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/

    https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html