Search code examples
cfilememorybinaryfiles

Should open any C file as a binary file


I've read somewhere we should always open file in C as a Binary file (even if it's a text file). At the time (few years ago) I didn't care too much about it, but now I really need to understand if that's the case and how come.

I've been trying to search for info on this but the most I find is the opening difference between them - not even their structural difference.

So I guess my question is: why should we always open the file as a binary even if we guess before hand it's a text file? Second question lies on the structure of each file itself, is a binary file like an "encrypted" text file?


Solution

  • The names "text" vs. "binary", while quite mnemonic, can sometimes leave you wondering which one to apply. It's best to translate them to their underlying mechanics, and choose based on which one of those you need.

    "Binary" could also be called "verbatim" opening mode. Each byte in the file will be read exactly as-is on disk. Which means that if it's a Windows file containing the text "ABC" on one line (including the line terminator), the bytes read from the file will be 65 66 67 13 10.

    "Text" mode could also be called "line-terminator translating" opening mode. When the file contains a sequence of 1 or more characters which is defined by the platform on which you're running as "line terminator"(1), the entire sequence will be read from the file, but the runtime will make it appear as if only the character '\n' (10 when using ASCII) was read. For the same Windows-file above, if it was opened as a text file on Windows, the bytes read from the file would be 65 66 67 10.

    The same applies when writing: a file openend as "binary" for writing will write exactly the bytes you give it. A file opened as "text" will translate the byte '\n' (10 in ASCII) to whatever the platform defines as the line-terminating character sequence.

    I don't think an "always do this" rule can be distilled from the above, but perhaps you can use it to make an informed decision for each case.


    (1) On Unix-style systems, the line-terminating character sequence is LF (ASCII 10). On Windows, it's the two-character sequence CR LF (ASCII 13 10). On old pre-X Mac OS, it was just the single-character CR (ASCII 13).