c floating-point language-lawyer scanf compiler-bug

Why does scanf parse "2E" but not "." (with GCC) as a "prefix of a matching input sequence" of a float?

Note: The original version of my question was compiler-agnostic and assumed that GCC (which I used to experiment) behaves entirely correctly and that a non-empty prefix of a matching input sequence doesn't lead to a matching failure or input failure. It turns out (see: C17 draft, 7.21.6.2 ¶10) that the answer is more likely to be found in compiler/library bugs than in the intricacies of the definition and proper treatment of prefixes to a match. However, in order to preserve the original spirit of the question, I have edited it only conservatively (therefore, the original assumption still shines through in the latter half of this post's body).

With this in mind, an aspect of the issues spanned by this post is still unresolved, namely: whether in the %4c example (at the bottom) it is proper for CD to be written into q[].

According to the standard (C17 draft, 6.4.4.2 ¶1), 2E0 (2.0) and .5 (0.5) are valid floating constants, while 2E and . are not.

Yet, with GCC, scanf parses 2E as 2.0, but it doesn't parse . as anything:

#include <stdio.h>

int main(void) {
    float fl;
    char c;

    printf("Please enter a floating-point number: ");
    if (scanf("%f", &fl) == 1)
        printf("<%.2f>\n", fl);
    if (scanf("%c", &c) == 1)
        printf("[%c]\n", c);

    return 0;
}

Intended usage:

Please enter a floating-point number: 123.4qrst
<123.40>
[q]

Here, q is used as a dummy character, to demonstrate how much of the input buffer the previous call to scanf consumed. Entering only a floating-point number will cause c to contain a newline character:

Please enter a floating-point number: 123.4
<123.40>
[
]

Let's try to parse 2E and . as floats:

With GCC (12.2.0, MinGW on Windows), the above code produces (gcc -std=c17 -pedantic -Wall -Wextra):

Please enter a floating-point number: 2Eq
<2.00>
[q]

Please enter a floating-point number: .q
[q]

With MSVC (19.35.32217.1), I get (cl /std:c17 /Wall):

Please enter a floating-point number: 2Eq
[q]

Please enter a floating-point number: .q
[q]

(Let's ignore the fact that it's not clear what floating-point number a string . "should" represent: 0 or 1.)

Let's try to make sense of this. Relevant here seems to be the following clause from the standard (C17 draft, 7.21.6.2 ¶9):

An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.²⁹¹⁾ The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.

²⁹¹⁾fscanf pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to strtod, strtol, etc., are unacceptable to fscanf.

As far as I can tell, the standard's most relevant clause about strtod and friends (here: strtof) in relation to my question is 7.22.1.3 ¶3 (not reproduced here), from which it follows that 2E0 (2.0) and .5 (0.5) are valid "subject sequences" (in the sense of 7.22.1.3 ¶2), while 2E and . are not.

If I understand clause 7.21.6.2 ¶9 correctly, only 0-length input items amount to matching failures or input failures. Because each of 2E and . is a valid prefix of a matching input sequence (albeit not a full matching input sequence), neither is a matching failure or input failure.

Hence we can ask: Why does scanf parse 2E as a float, but not . (on GCC)?

This might be related to subtleties surrounding the definition of a "prefix of a matching input sequence".
It is also possible that details around clause 7.22.1.3 ¶3 (regarding strtod/strtof/strtold) are relevant, as they relate to the pushback limit of 1 of scanf and friends.

I believe that the following code might illustrate the notion of a "prefix of a matching input sequence":

#include <stdio.h>

int main(void) {
    char p[5] = "pppp", q[5] = "qqqq";
    int i;

    i = sscanf("ABCD", "%2c%4c", p, q);
    printf("<%s>\n", p);  /* <ABpp> */
    printf("<%s>\n", q);  /* <CDqq> */
    printf("%d\n", i);  /* 2 (GCC), 1 (MSVC) */

    return 0;
}

(Note that oddly GCC and MSVC give different results for the number of items assigned, even though both write AB and CD onto p[] and q[], resp.)

Here, even though exactly 4 characters are required for a match for %4c / q (C17 draft, 7.21.6.2 ¶12, item c; irrelevant footnote about multibyte characters not reproduced here)

Matches a sequence of characters of exactly the number specified by the field width (1 if no field width is present in the directive).[^fn]

CD is a valid "prefix of a matching input sequence", and therefore this code doesn't result in a matching failure or input failure. (Given that assignments can be shorter than the given field width, I find it confusing that the standard uses the word "exactly".)

Or: If GCC or potentially MSVC don't behave correctly, what should the output be, here and for the %f example above?

I found 2 similar questions (listed here in no particular order):

(I believe that this question of mine has broader coverage with the . and %Nc examples.)

Solution

If I understand clause 7.21.6.2 ¶9 correctly, only 0-length input items amount to matching failures or input failures.

7.21.6.2 9 is not the only paragraph that specifies matching failures. Paragraph 10 says:

… If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure…

“.” is prefix of a matching input sequence, so it is scanned (consumed, removed from the stream), but it is not a matching input sequence, so there is a matching failure.

printf("%d\n", i); /* 2 (GCC), 1 (MSVC) */

The MSVC result conforms to the C standard. The GCC result (due to glibc, not GCC) does not. For %4c, a matching sequence is “exactly the number [of characters] specified by the field width” (C 2018 7.21.6.2 12). Therefore “CD” is not a matching sequence. It is, however, a prefix of a matching sequence. So, it should be consumed, and scanf should process it as a matching failure. So the prior %2c matched and the %4c did not, so there is one completed assignment of input items, so the return value should be one.