Search code examples
cwindowsunixnewlinefgets

What does fgets() reads?


I have to migrate a program from UNIX to Windows. The program is about getting data from serial port. In original UNIX machine that is receiving it, it reads as:

char   my_buffer_a[200];

memset(my_buffer_a, '\0', sizeof(my_buffer_a));

if (fgets(my_buffer_a, sizeof(my_buffer_a), file_p))
{
    n = sscanf("%d\t%d\t...%d\n0x0A", 
               my_record.field1, my_record.field2, ...)
    ....
}

I know that fgets() will read until a EOF or a newline character. That means in my_buffer_a, I would expect the '\n' at the end. Yet to my suprise, there is also 0x0A! (Does that mean TWO newline characters?) What is going on here?

Another thing I worry is in the definition of newline character in Unix and Windows. They are not the same in Unix and in Windows. I mean after I convert, when I do sscanf, I will have to do sscanf("%d\t%d\t...%d\r\n (or sscanf(%d\t%d\t...%d\r0x0D\n0x0A while keeping that wierd logic)?


Solution

  • A newline in a scanf format string does NOT specifically look for a newline in the input -- it just skips whitespace (any whitespace). The same for tab characters. In addition, most scanf format specifiers (everything except %c and %[) skip whitespace. So in general:

    • you should never use \t, \r or \n in a format string (except within %[..]), as it will just confuse people. Use a space instead, which does the same thing

    • you should avoid using redundant spaces in the format string, because they don't anything and will just confuse people.

    So with the above, your sscanf call becomes:

    sscanf("%d%d ...%d 0x0A",
    

    Making it clear that this reads two numbers, followed by three periods, followed by another number, and finally the string 0x0A (4 characters). There may be any whitespace between any of those 6 things (or not), and it will be ignored.

    As far as windows vs Unix is concerned, with Windows, there will be extra carriage return (\r) characters at the end of lines. But these are just whitespace, and so will be skipped by scanf just like any other whitespace. So you can just ignore them.