Search code examples
cscanf

How do escape sequences actually behave in the scanf() function


I've looked through some stuff to find out, but nothing proper has been found (or I am a bad searcher though).

Now I'm learning the scanf function from stdio.h header, and they say escape sequences are not recommended here because it 'confuses' the input, but what actually happens? I tested some code out and got results for differens ES:

#include <stdio.h>

int main(void)
{
    int a, b, c;

    printf("Enter values: \n");
    scanf("%d%d%d\n", &a, &b, &c);
    printf("The values are %d, %d and %d.\n", a, b, c);
}

Enter values: 
1
2
3
randomstuff
The values are 1, 2 and 3.

The same is with '\t' sequence – it asks for one more value (I typed 'randomstuff') which is not stored anywhere. '\a' and '\b', on the other hand, behave differently and does not affect the input:

    ...
    scanf("%d%d%d\a", &a, &b, &c);
    printf("The values are %d, %d and %d.\n", a, b, c);
}

Enter values: 
1
2
3
The values are 1, 2 and 3.

It becomes more confusing when I mess around with the position of the ES. '\n' and '\t' stop affecting the input:

    ...
    scanf("%d\t%d\n%d", &a, &b, &c);
    printf("The values are %d, %d and %d.\n", a, b, c);
}

Enter values: 
1
2
3
The values are 1, 2 and 3.

whereas '\a' and '\b' start having some influence:

    ...
    scanf("%d\b%d%d", &a, &b, &c);
    printf("The values are %d, %d and %d.\n", a, b, c);
}

Enter values: 
1    
The values are 1, 0 and 32766.

Some say ES are not interpreted by scanf() and are simply taken as plain characters, but that does not seem true as well since it does interpret such ES like \' or \" in accordance with the rules.

So what is going on? PS and why is there 0 and 32766 in the last outcome?


Solution

  • When scanf receives a string, there are no escape sequences in it. If a string literal was used as an argument, the escape sequences in it were processed when the program was translated (compiled). C 2018 6.4.4.4 specifies how escape sequences are processed:

    • \', \", \?, and \\ become ', ", ?, and \, respectively.
    • \a, \b, \f, \n, \r, \t, and \v become characters for alert, backspace, formfeed, new line, carriage return, horizontal tab, and vertical tab, respectively.
    • \d, \dd, and \ddd, where each d is an octal digit, become the character with that value.
    • \x followed by one or more hexadecimal digits becomes the character with that value.

    Then for scanf, the characters have meanings:

    • If the character is a white-space character (space, horizontal tab, new line, vertical tab, formfeed), it directs scanf to read input up to the first non-white-space character (which remains unread) or until no more characters can be read.
    • If the character is % (which can result from an octal or hexadecimal escape sequence), it introduces a conversion specification, such as %d.
    • Otherwise, it directs scanf to read the next character and match it literally (an alert character in the format string must match an alert character in the input stream, and so on).

    The same is with '\t' sequence – it asks for one more value (I typed 'randomstuff') which is not stored anywhere.

    \t in a string literal becomes a (horizontal) tab, and a tab character is a white-space character, so it directs scanf to read input until it sees a white-space character or cannot get more input. This is why, when you added \t, scanf kept reading until it saw the non-white-space character “r” of your “randomstuff”.

    '\a' and '\b', on the other hand, behave differently and does not affect the input:

    These are not white-space characters and are not %, so they direct scanf to attempt to match them literally with characters in the input stream. For "%d%d%d\a", scanf matched the three %d specifications, and then it read one more character to attempt to match it to the alert character. That failed, so scanf stopped and returned 3 for the three successful matches. (scanf also “put back” the non-matching character into the input stream.)

    It becomes more confusing when I mess around with the position of the ES. '\n' and '\t' stop affecting the input:

    In "%d\t%d\n%d", the tab and new line characters direct scanf to match white-space characters. However, these are already normally skipped by scanf as part of the %d conversion specification, so they have no effect before %d.

    whereas '\a' and '\b' start having some influence:

    With "%d\b%d%d", the first decimal numeral is processed, and then the backspace directs scanf to match a backspace character. Since there is no backspace character in the input, matching fails, and scanf stops.

    Note that none of these behaviors involve scanf processing escape sequences. scanf is processing the characters it receives, which are the characters such as tab, alert, and backspace. It does not see the original escape sequences.