Search code examples
audiopcm

What does a sample of audio data represent?


I want to know what a single sample of audio data (uncompressed PCM) represents.

It is a number, but what exactly is that number and how come it can be converted back to audio?

For example if it is a 4-bit sample, does 0 represent absolute silence and 15 represent max volume?

If it is volume, what frequency are we talking about? How is the information about the frequency stored?

In songs we can hear various instruments (frequencies) at the same time, meaning each frequency is somehow stored in a single sample. How is that done?


Solution

  • Audio is just a curve which wobbles up/down with time going left/right. At a given point in time a Sample is a measure of the curve height. Silence is when the curve does not wobble ... it just goes flatline ... at value zero with a Sample value of 0 (more accurately the middle value of its range from max to min) ... when curve reaches its maximum height up or down that stretch of audio is the loudest possible

    The notion of normalization is important ... the absolute range of curve values (maximum up or down) is arbitrary ... could be anything ... lets say max is 15 and minimum is 0 ... remember silence is no wobble so middle of max up/down silence would be about 7

    Curves can be encoded into any number of bits ... this roughly maps into how many horizontal lines you dice the curve into ... more lines more bits so greater accuracy in value of your Sample of curve height

    sin wave

    A sin or cos curve is considered a pure tone ... Joseph Fourier proved an arbitrary curve (audio or otherwise) can be stored in the form of a set sin curves of (A) various volumes (max up/down) (B) various frequencies (C) various phase offsets ... interestingly this transformation works in either direction : from a curve of arbitrary shape into a set of above (A/B/C) or from a set of (A/B/C) back into synthesizing a curve of arbitrary shape (this is how audio synthesizers work)

    Information about frequency storage is baked into the curve shape ... its all about how often the curve wobbles up/down ... lazy wobbles taking a long time to cross from below to above the middle line are low frequency ... a stretch of tightly spaced squiggles implies a high frequency squawk

    When a microphone records multiple people all talking at once or various instruments all emitting their own sounds we have many simultaneous frequencies yet the recording somehow just works - How ? think of what happens inside the microphone ( or to your flat eardrum ) ... its coil can be considered as a flat surface (a 2D surface ) which can only get sloshed up or down period ... either only moves back and forth ... this is an arbitrary curve ... one curve which at a point in time has a value of its height as it progresses from max to min