Search code examples
c++stringutf-8fopenlinefeed

Read text-file in C++ with fopen without linefeed conversion


I'm working with text-files (UTF-8) on Windows and want to read them using C++.

To open the file corrently, I use fopen. As described here, there are two options for opening the file:

  • Text mode "rt" (Carriage return + Linefeed will automatically be converted into Linefeed; Short "\r\n" becomes "\n").
  • Binary mode "rb" (The file will be read byte by byte).

Now it becomes tricky. I don't want to open the file in binary mode, since I would lose the correct handling of my UTF-8 characters (and there are special characters in my text-files, which are corrupted when interpreted as ANSI-character). But I also don't want fopen to convert all my CR+LF into LF.

Is there a way to combine the two modes, to read a text-file into a string without tampering with the linefeeds, while still being able to read UTF-8 correctly?

I am aware, that the reverse conversion would happen, if I write it through the same file, but the string is sent to another application that expects Windows-style line-endings.


Solution

  • The difference between opening files in text and binary mode is exactly the handling of line end sequences in text mode or not touching them in binary mode. Nothing more nothing less. Since the ASCII characters use the same code points in Unicode and UTF-8 retains the encoding of ASCII characters (i.e., every ASCII file happens to be a UTF-8 encoded Unicode file) whether you use binary or text mode won't affect the other bytes.

    It may be worth to have a look at James McNellis "Unicode in C++" presentation at C++Now 2014.