Search code examples
error-handlingld

ld error messages: why is the first single quote actually a backtick?


If you look at the error messages generated by ld, take this example (taken from this answer):

/home/AbiSfw/ccvvuHoX.o: In function `main':
prog.cpp:(.text+0x10): undefined reference to `x'
prog.cpp:(.text+0x19): undefined reference to `foo()'
prog.cpp:(.text+0x2d): undefined reference to `A::~A()'

It can be seen that the single quotes are not balanced, and what should be the first single quote (') is in fact a backtick (`).

Why is that? Is there a specific reason, or is this a historical typo? Has this ever been logged as a UI bug? Has it never been fixed because it is just too late to fix it, as fixing it now would create a mismatch of errors generated?

I would expect the error messages to look like this:

/home/AbiSfw/ccvvuHoX.o: In function 'main':
prog.cpp:(.text+0x10): undefined reference to 'x'
prog.cpp:(.text+0x19): undefined reference to 'foo()'
prog.cpp:(.text+0x2d): undefined reference to 'A::~A()'

Now the single quotes are balanced.


Solution

  • Looks like a question that is Not About Programming - but in an interesting historical way it is. Any puzzling quirk of an ancient tool like ld likely had something to do with programming in the first place.

    The use of

    `...' 
    

    to enclose quotation in ld's diagnostics isn't a typo that's never been fixed, it's deliberate. This style of mismatched quotation marks goes back at least to the 1970s with programmers and was still common programmerly style for about the first decade of my professional career, 1985 onward. I'm going to call these M4-style quotes to avoid having to format them in inline markdown, which I don't see how to do.

    It used to be, and still is, good form to consider that any logging or diagnostic text you cause a program to output is apt to be fed into other programs or scripts that are sensitive to quotation and/or shell-expansion and should be appropriately clean. So we avoided, and I hope still avoid, outputing false backtick-expansion quotes,

    `this is not a command`
    

    or multiple styles of quotation, or apostrophe possessives ("Bob's whippet") or apostrophe contractions ("It's a whippet"), because such pollutants would trip up the parsing of quotations ('Bob's whippet hasn't got a snowball's chance in hell').

    It used to be good form also to use M4-style quotes because they made pretty robust quotation parsing much cheaper than matching quotes. In clean text, they make quote-opening and quote-closing distinguishable context-free. It may seem absurd now when compute resources are dirt cheap; but when compute resources were 1000 times more expensive per instruction and a 1000 times slower than now, penny-pinching economies like this were pervasive and revered in programming culture, and a few got fossilised in places.

    When nested quotations are considered, the M4-style stretches its lead over the '...' style. Compare:

    'Amy said 'Bob whispered 'Cathy is a fool' to Dave', according to Edith'
    

    and:

    `Amy said `Bob whispered `Cathy is a fool' to Dave', according to Edith'
    

    With '...', it's undetermined whether there are 3 quotations, none nested, separated by unquoted text, or 3 quotations with 2nd and 3rd each nested in their predecessor: not unless you deploy additional and less dependable grammar than just: the contents of '...', are in quotation. With M4-style the second parse is the only one.

    That isn't plausibly a big consideration in a modern programmer's work, but historically it was an important one for an important number of programmers in the heyday of the M4 macro-processing language, invented by Brian Kernighan and Dennis Ritchie (also prominent in the invention of Unix and many other enduring Unix tools). M4 recognized the M4-style of string quotation by default, because it was cheap and unambiguous in nesting, and this fact exerted a stylistic influence on programmers widely which might take the form, "By default I produce clean text as if M4 is going to consume it", or at a further remove, "By default I produce clean text the real programmer's way".

    I'm sure there are tools other than ld that I use that still emit M4-style quotes but none of them spring to mind. Maybe it's a delusion.

    BTW, M4 is not dead: it's still a part of GNU autotools and every autotooled package builder will encounter it, if only as consumer of M4 macros.