I already asked about audio volume normalization. On most methods (e.g. ReplayGain, which I am most interested in), I might get peaks that exceed the PCM limit (as can also be read here).
Simple clipping would probably be the worst thing I can do. As Wikipedia suggests, I should do some form of dynamic range compression.
I am speaking about the function which I'm applying on each individual PCM sample value. On another similar question, one answer suggests that doing this is not enough or not the thing I should do. However, I don't really understand that as I still have to handle the clipping case. Does the answer suggest to do the range compression on multiple samples at once and do to simple hard clipping in addition on every sample?
Leaving that aside, the functions discussed in the Wikipedia article seem to be somewhat not what I want (in many cases, I would still have the clipping case in the end). I am thinking about using something like tanh. Is that a bad idea? It would reduce the volume slightly but guarantee that I don't get any clipping.
My application is a generic music player. I am searching for a solution which mostly works best for everyone so that I can always turn it on and the user very likely does not want to turn this off.
Using any instantaneous dynamic range processing (such as clipping or tanh non-linearity) will introduce audible distortion. Put a sine wave into an instantaneous non-linear function and you no longer have a sine wave. While useful for certain audio applications, it sounds like you do not want these artefacts.
Normalization does not effect the dynamics (in terms of min/max ratio) of a waveform. Normalization involves element-wise multiplication of a waveform by a constant scalar value to ensure no samples ever exceed a maximum value. This process can only by done off-line, as you need to analyse the entire signal before processing. Normalization is also a bad idea if your waveform contains any intense transients. Your entire signal will be attenuated by the ratio of the transient peak value divided by the clipping threshold.
If you just want to protect the output from clipping you are best off using a side chain type compressor. A specific form of this is the limiter (infinite compression ratio above a threshold with zero attack time). A side-chain compressor calculates the smoothed energy envelope of a signal and then applies a varying gain according to that function. They are not instantaneous, so you reduce audible distortion that you'd get from the functions you mention. A limiter can have instantaneous attack to prevent from clipping, but you allow a release time so that the limiter remains attenuating for subsequent waveform peaks, the subsequent waveform is just turned down and so there is no distortion. After the intense sound, the limiter recovers.
You can get a pumping type sound from this type of processing if there are a lot of high intensity peaks in the waveform. If this becomes problematic, you can then move to the next level and do the dynamics processing within sub-bands. This way, only the offending parts of the frequency spectrum will be attenuated, leaving the rest of the sound unaffected.