Search code examples
compiler-constructionlexical-analysis

How to tokenize printf("result is %d\n",a) ; statement by lexical analyser


If the lexical anlyzer takes "result is %d\n" as one token in: printf("result is %d\n",a); then how and at what stage is %d and \n are recognized as format specifier and next line respectively.


Solution

  • The literal string in the call to printf is an ordinary string literal, not different from any other string literal in your program. printf is an ordinary function, whose first argument is expected to be a string. Nothing requires the first argument of printf to be a string literal; it could be any expression whose value is a pointer to a string. (Although many style guides warn you against actually doing that.) So the following is perfectly legal:

    const char* fmt = "The result is %d\n";
    /* ... */
    
    printf(fmt, a);
    

    Inside string literals, escape sequences like \n are turned into the special characters they represent (a newline character in this case). So "\n" is a string literal containing a single character.

    Each time printf is called, it scans the provided format string to identify the format conversions. Clearly, that happens at run-time, not when the program is compiled.

    Having said that, since printf is a standard library function with well-defined behaviour, it is legal for a compiler to optimise a call to printf if the format argument is known at compile-time. Some compilers take advantage of this.