Search code examples
ccompilationc-preprocessordigraphstrigraphs

Are digraphs transformed by a compiler and trigraphs transformed by a preprocessor?


I'm trying to understand both trigraphs and digraphs rather than use them.

I've read that post and I understood that:

  • Converting trigraphs to corresponding characters shall always be done by the preprocessor, before the actual compilation starts.
  • Converting digraphs to corresponding characters shall be performed by the compiler.

Is this true?


Solution

  • Trigraph sequences are indeed replaced with the corresponding character at the first phase of the compiling process, before the preprocessor lexer analyses the stream of characters to produce preprocessor tokens.

    The very next phase handles escaped newlines, ie: instances of \ immediately followed by a newline, which are removed from the character stream. Note that the \ can be produced by the first phase as a replacement for the ??/ trigraph.

    The lexer then analyses the character stream to produce preprocessing tokens, such as [, and <: which are alternate spellings for the same token, just like 1e1 and 1E1, hence <: is not replaced with [, it is a different sequence of characters producing the same token.

    Trigraphs cannot be produced by token pasting using the ## preprocessor operator in macro expansions, but digraphs can.

    Here is a small sample program to illustrate this process, including th special handing of the ??/ trigraph that expands to \, thus can be used in the middle of a digraph split on 2 lines:

    #include <stdio.h>
    
    #define STR(x) #x
    #define xSTR(x) STR(x)
    #define glue(a,b) a##b
    
    int main() {
        puts(STR(??!));
        puts(STR('??!'));
        puts(STR("??!"));
    
        puts(STR(<:));
        puts(STR('<:'));
        puts(STR("<:"));
    
        puts(STR(<\
    :));
        puts(STR(<??/
    :));
        puts(STR('<\
    :'));
        puts(STR("<\
    :"));
    
        puts(STR(glue(<,:)));
        puts(xSTR(glue(<,:)));
        return 0;
    }
    

    Output:

    chqrlie $ make lexing && ./lexing
    clang -O3 -funsigned-char -std=c11 -Weverything -Wwrite-strings  -lm -o lexing lexing.c
    lexing.c:8:14: warning: trigraph converted to '|' character [-Wtrigraphs]
        puts(STR(??!));
                 ^
    lexing.c:9:15: warning: trigraph converted to '|' character [-Wtrigraphs]
        puts(STR('??!'));
                  ^
    lexing.c:10:15: warning: trigraph converted to '|' character [-Wtrigraphs]
        puts(STR("??!"));
                  ^
    lexing.c:18:15: warning: trigraph converted to '\' character [-Wtrigraphs]
        puts(STR(<??/
                  ^
    4 warnings generated.
    |
    '|'
    "|"
    <:
    '<:'
    "<:"
    <:
    <:
    '<:'
    "<:"
    glue(<,:)
    <: