Why every audio part is louder in FFmpeg when I join them in one audio?

I trying to make dubbing for audio. I have original audio track and I want to put translated audio parts on top of the original.

translated audio 100% vol: --p1--- ---p2-- -----p3--- --p4--

original audio 5% vol: -----------------------------------------

Here is my FFmpeg command with filter_complex

ffmpeg -i video_wpmXlZF4XiE.opus -i 989-audio.mp3 -i 989-audio.mp3 -i 989-audio.mp3 -i 989-audio.mp3 \
-filter_complex "\
[0:a]loudnorm=I=-14:TP=-2:LRA=7, volume=0.05[original]; \
[1:a]loudnorm=I=-14:TP=-2:LRA=7, adelay=5000|5000, volume=1.0[sent1]; \
[2:a]loudnorm=I=-14:TP=-2:LRA=7, adelay=10000|10000, volume=1.0[sent2]; \
[3:a]loudnorm=I=-14:TP=-2:LRA=7, adelay=20000|20000, volume=1.0[sent3]; \
[4:a]loudnorm=I=-14:TP=-2:LRA=7, adelay=30000|30000, volume=1.0[sent4]; \
[original][sent1][sent2][sent3][sent4]amix=inputs=5:duration=longest[out]" \
-map "[out]" output.mp3

Audios I put on top of the original audio track is the same -i 989-audio.mp3 I made it by purpose to show the problem And here is the audio levels on final generated track.

As you can see, first and second only slightly different but third and fourth have totally different(higher) volume level (Notice, audio is the same). Why it's happened? And how can I workaround this odd behaviour?

Solution

amix filter does not, by default, mix the inputs directly but adjusts their volume depending on the number of active inputs at that instant, as per the scheme described in this answer. You can avoid this adjustment by adding the option normalize=0.