Bash 5.2 crashes due to an assertion failure in malloc but only when run in Valgrind and only when LC_CTYPE
is set. Here's an example output:
$ path/to/env - foo=bar LC_CTYPE=C.UTF-8 path/to/valgrind path/to/bash -c 'echo ${foo#spam}'
...
malloc: subst.c:5331: assertion botched
free: called with unallocated block argument
Aborting...==2753214==
==2753214== Process terminating with default action of signal 6 (SIGABRT): dumping core
==2753214== at 0x48DFA8C: __pthread_kill_implementation (in /nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8/lib/libc.so.6)
==2753214== by 0x4890C85: raise (in /nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8/lib/libc.so.6)
==2753214== by 0x487A8B9: abort (in /nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8/lib/libc.so.6)
==2753214== by 0x443AF9: programming_error (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x4ACAC4: internal_free.constprop.0 (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x450A5E: remove_pattern (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x465D2B: parameter_brace_remove_pattern (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x46023A: param_expand (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x460CD9: expand_word_internal (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x466C0D: shell_expand_word_list.constprop.0 (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x467479: expand_words (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
==2753214== by 0x4361CE: execute_command_internal (in /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash)
...
==2753214== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
/nix/store/a683qmhmrrzrwn8fmqh53yyylm7yn2hq-test.sh: line 2: 2753214 Aborted (core dumped) /nix/store/v45j2p2izb3pa2fxdw978bahhkb2ghza-toybox-0.8.10/bin/env - LC_CTYPE=C.UTF-8 /nix/store/14fg82n6grqhrd2algx31sv1kmgvz0gl-valgrind-3.21.0/bin/valgrind /nix/store/vqvj60h076bhqj6977caz0pfxs6543nb-bash-5.2-p15/bin/bash -c 'echo ${PATH#":"}'
(full output here)
${parameter#word}
is a kind of parameter expansion described here.
The indicated line of source code points here, but is the problematic assertion in free
or malloc
?
Experimenting with some variations:
foo
unset or set to empty-string causes Bash to succeed (no crash); but any non-empty setting of foo
seems to cause a crash .foo
, Bash crashes on subst.c:5336
instead of subst.c:5331
; both cases cause a crash, when the pattern is matched by the parameter expansion and when it isn't, but in slightly different places.LC_CTYPE
is not set or set to any other locale (including non-existant locales), Bash does not crash (although there is a non-fatal invalid free()
).How should I go about debugging this problem?
A note on reproducibility:
flake.nix
and flake.lock
to an empty directory, you should be able to type nix run
and (hopefully) get a crash too.I would create a special build of Bash in which Bash's malloc wrapping is disabled and try to reproduce the problem under Valgrind as before.
You're running into the issue that Bash itself is self-diagnosing a malloc issue. It will not do as good a job as Valgrind itself.
Bash's diagnostic is saying that free
was called on an unallocated block. A similar diagnostic from Valgrind is more informative. If an allocated object existed at that address previously, Valgrind will show that, along with a backtrace where it was freed.
Bash's debugging malloc (see the internal_free
function in lib/malloc/malloc.c
is relying on checking some header information in the freed block to conclude that it's a double free. That is not accurate. The code looks like:
if (p->mh_alloc != ISALLOC)
{
if (p->mh_alloc == ISFREE)
xbotch (mem, ERR_DUPFREE,
_("free: called with already freed block argument"), file, line);
else
xbotch (mem, ERR_UNALLOC,
_("free: called with unallocated block argument"), file, line);
}
If a magic byte is found in the block header which is not ISALLOC
(that being 0xF7
), it checks specifically for ISFREE
. If it's not ISFREE
either (0x57
) then it emits the diagnostic you are seeing.
This is wide open to a false positive, because it occurs when the magic code in the header has been clobbered to value that is not one of two possible values out of 256.
We cannot reasonably believe this to be a double free problem. It is quite likely corruption, and Valgrind's allocator will do a much better job of diagnosing it, if it reproduces.