Search code examples
cwindowsnotepad

How is line break handled under Windows and why it affect text rendering in some editor


I copied an excerpt from a PDF and paste on Sublime Text. The excerpt came with line break:

With line break

I wrote a small C program to remove line break.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main(){
    FILE* in = fopen("feynman.txt","r");
    FILE* out = fopen("feynmanStripped.txt","w");
    assert(in!=NULL && out!=NULL);
    int c;
        c = fgetc(in);
        while(c!=EOF){
        if(c!='\n')
            fputc(c,out);
        c = fgetc(in);
    } 
}

The program was executed under Cygwin.

The resulting text was opened in Sublime Text and Notepad: enter image description here

As you can see, line break disappear in Notepad, but not in Sublime Text. I also tried to read/write as "rb"/"wb" mode, but it didn't make a difference.

I guess it might be due to how Windows deals with '\n' and '\r', which affects how Sublime Text and Notepad render the text. What is working under the hood?

( Note : I also copy/paste the same text to MS Word, the result is same as in ST )


Solution

  • Interesting. Yes, this is related to the whole \r\n thing, there's no other sane explanation unless your files were mangled in strange ways. It is a little odd, though, that this happens when you're opening files in text mode under Windows; you'd expect the mapping to be done for you. There's something rotten in the state of Cygwin, and fortunately it is documented here.

    Long story short: You're not opening the file in text mode. This is surprising because the C standard rather says that your code should be doing that, but Cygwin strays from the path here. Instead, you're opening the file in what the Cygwin docs call default mode. Whether default mode does the Windows newline mapping depends on a number of things such as whether the file path is specified as a UNIX or Windows path, how the file system a UNIX path resolves to is "mounted," how you linked the program and other stuff (follow the link for details). This leaves you several ways to resolve the problem:

    • Link with -ltextmode, forcing default mode to mean text mode,
    • fopen("feynman.txt", "rt") (note: this is not, strictly speaking, standard-conforming code),
    • fopen(".\\feynman.txt", "r") (force a windows path)
    • if(c != '\n' && c != '\r') ...
    • maybe others, but those are the ones I can think of right now.