Search code examples
ccompiler-constructionoperating-systemembedded-linux

How are backslash escape sequences implemented in compilers?


I just wanted to know how backslash escape sequences are implemented in compilers? If we write "\n" in a string, how does a compiler come to replace it with a new line character? How does a compiler come to replace "\b" with a backspace character?

I ask because I wrote the code:

#include<stdio.h>
main()
{
    printf("Hello \c");
}

The output was:

Hello 
Exited: ExitFailure 7 

I ran it in codepad, I was going through KnR book question number 1.2.

Thanks in Advance


Solution

  • To understand this, you have to understand a little bit about how compilers work in general. The first step which compilers generally undertake is called lexical analysis (or lexing for short). Lexical analysis is when the compiler takes the input code and breaks it into pieces which it can recognize. To do this, it usually uses regular expressions to recognize the different pieces. One of the pieces it recognizes is a string literal, which is a quoted string like "Hello". The regular expression for a string literal usually looks like "([^\"]|\"|\\|\n|\b)*". Which, in English, means a list of characters which starts with a double quote and ends with a double quote, and in between has either 1) any character which isn't a double quote or a backslash 2) a backslash and then a double quote 3) a backslash and then another backslash 4) a backslash and then an n 5) a backslash and then a b. This middle pattern is repeated zero or more times. (Note: in real compilers, the list of characters which can occur after the back-slash is generally longer). Looking for this pattern allows it to find string literals.

    Then, once the string literal has been identified, to find out what string to actually put in memory, it has to do a second layer of processing which is to go through the string literal and handle the backslashes. It just reads from the start to the end, looking for backslash sequences. Each of the backslash sequences is replaced with a different character. \" becomes ". \\ becomes \. \n becomes a newline. \b becomes a backspace character, and so forth. To figure out which to put where, it just uses a table which shows what to put in place for that sequence.