How is the WAV "data" sub-chunk structured?

I'm writing a C# application to process WAV files, and I've finished enough code to read through any file's chunks (ie reads in formatting metadata and all other chunks, ready to process data with this information).

I'm at the point where I have to now process the data chunk, but I have no idea/resources to learn how the samples are pieced together. If possible, can you please answer with links/info on only the data chunk, not how WAV files are structured in general.

I need to learn more about how samples over time are structured byte-by-byte.

Thank you! If I did something wrong with the question, please comment so I can rephrase/edit the post.

Solution

This certainly depends on the codec used, but we'll assume PCM since that's by far the most common thing you'll find in WAV files.

PCM is a way of encoding the measurement of pressure at a particular instant in time. If I measure pressure levels fast enough, and with enough resolution, I can accurately approximate the original waveform.

From Wikipedia: https://en.wikipedia.org/wiki/Pulse-code_modulation

Since you're already parsing the header, you know the sample rate. That's the number of samples taken per second. 44,100 samples per second (or, a 44.1 kHz sample rate) is typical CD audio. For video, a sample rate of 48 kHz is more commonly used.

From the header, you also know the bits per sample. This indicates the resolution of each sample taken. 16-bit samples naturally take 2 bytes for each sample.

In the audio data, the samples are just the numeric values, one after the other.

[sample 0][sample 1][sample 2][...]

The channel count is also indicated in the header, which tells you how many discrete channels were sampled. Mono sound is just 1. Stereo sound will have 2. 5.1 surround sound would have 6. The sample values themselves are interleaved, one channel sample after the other, forming a frame. If I had a stereo track with left/right channels, it would look something like this:

[L][R][L][R][L][R][L][R][...]

To actually read those numeric values, the data is typically written as little-endian. For 16-bit samples and higher, signed integers are typically used.