Search code examples
fileunixtext-filesposixnewline

Why should text files end with a newline?


I assume everyone here is familiar with the adage that all text files should end with a newline. I've known of this "rule" for years but I've always wondered — why?


Solution

  • Because that’s how the POSIX standard defines a line:

    3.206 Line
    A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

    Therefore, “lines” not ending in a newline character aren't considered actual lines. That's why some programs have problems processing the last line of a file if it isn't newline terminated.

    The advantage of following this convention is that all POSIX tools expect and use it. For instance, when concatenating files with cat, a file terminated by newline (a.txt and c.txt below) will have a different effect than one without (b.txt):

    $ more a.txt
    foo
    
    $ more b.txt
    bar
    $ more c.txt
    baz
    
    $ cat {a,b,c}.txt
    foo
    barbaz

    We follow this rule for consistency. Doing otherwise would incur extra work when dealing with the default POSIX tools.


    Think about it differently: If lines aren’t terminated by newline, making commands such as cat useful is much harder: how do you make a command to concatenate files such that

    1. it puts each file’s start on a new line, which is what you want 95% of the time; but
    2. it allows merging the last and first line of two files, as in the example above between b.txt and c.txt?

    Of course this is solvable but you need to make the usage of cat more complex (by adding positional command line arguments, e.g. cat a.txt --no-newline b.txt c.txt), and now the command rather than each individual file controls how it is pasted together with other files. This is almost certainly not convenient.

    … Or you need to introduce a special sentinel character to mark a line that is supposed to be continued rather than terminated. Well, now you’re stuck with the same situation as on POSIX, except inverted (line continuation rather than line termination character).


    Now, on non POSIX compliant systems (nowadays that’s mostly Windows), the point is moot: files don’t generally end with a newline, and the (informal) definition of a line might for instance be “text that is separated by newlines” (note the emphasis). This is entirely valid. However, for structured data (e.g. programming code) it makes parsing minimally more complicated: it generally means that parsers have to be rewritten. And if a parser was originally written with the POSIX definition in mind, then it might be easier to modify the token stream rather than the parser — in other words, add an “artificial newline” token to the end of the input.