I was experimenting with extern
and extern "C"
for a little, and accidentially had a typo in one of the identifiers - a $
had snuck in. When I compiled the code and got the error of an undefined symbol and eventually saw what caused it, it made me curios if it would actually compile. And guess what - Clang actually did compile that.
According to documentation I had read previously, the rules for identifiers were basically:
a-z
, A-Z
or 0-9
and _
.But this compiled just fine - no warning was showing too:
void __this$is$a$mess() {}
int main() { __this$is$a$mess(); }
When looking at it:
Ingwie@Ingwies-Macbook-Pro.local /tmp $ clang y.c
Ingwie@Ingwies-Macbook-Pro.local /tmp $ nm a.out
0000000100000f90 T ___this$is$a$mess
0000000100000000 T __mh_execute_header
0000000100000fa0 T _main
U dyld_stub_binder
I can see the symbol name very clearly.
So why is it that Clang will let me do this, although by ANSI standards, it should not? Even the GCC 6 I have installed did not warn or error about this.
Which compilers will allow what kinds of identifiers - and, why actually?
The rules in the 2018 C standard for identifiers include:
_
, a
to z
, A
to Z
, a universal-character-name, or “other implementation-defined characters”.0
to 9
.\u
followed by four hexadecimal digits or \U
followed by eight hexadecimal digits, which specify Unicode characters.So, if an implementation allows $
, that is a valid character for that implementation. You may use it, but it may not be portable to other implementations. The C standard requires implementations to accept the specific characters listed, but it allows them to accept more. Generally, the C standard should be viewed as an open field rather than a walled garden: The behavior is defined within the field, but you are not stopped at the barrier; you may go beyond it, at your own risk.
The rules you were taught were rules for what is portable, not rules for what the C standard requires implementations to restrict you to.
The C standard defines strictly conforming code, which is, roughly speaking, code that should work in any C implementation, and conforming code, which is code that works in at least one C implementation. Conforming code is still C code. So the rules you were taught were for strictly conforming code.
Generally, you should prefer to write strictly conforming code and only use additional features when benefit (speed, ease of development on a particular platform, whatever) is worth the cost (loss of portability).