Search code examples
unixnewlineline-endings

What is the difference between crlf and lf in effects and meaning?


I know the difference between line feed and carriage return. Carriage return moves the cursor at the beginning of the line and line feed moves the cursor to next without going to begin of that line. But doesn't the cursor go to the beginning of the next line when using line feed? Then why people says that lf only moves the cursor to next line without going to beginning and crlf goes to beginning of the line and then go to the next line. When using any programming languages like C to print "Hello\nHELLO", doesn't it automatically prints HELLO in the beginning of next line? Then how is there difference between CRLF and LF?


Solution

  • It is confusing because Windows and Unix behave differently with respect to them:

    linux-$ printf "abc\ndefg\rhij\r\n" | hexdump -C
    00000000  61 62 63 0a 64 65 66 67  0d 68 69 6a 0d 0a        |abc.defg.hij..|
    0000000e
    

    As you can see, the program just emits these characters (0x0d - CR, 0x0a - LF) as instructed, either to a pipe (| hexdump -C), a file (which is where you'd use dos2unix or unix2dos), or the terminal.

    If we emit it to a terminal, then you start seeing the difference:

    linux-$ printf "abc\ndefg\rhij\r\n"
    abc
    hijg
    

    This is from URXVT on Linux that is configured to use the regular Unix convention (which is different from the semantics).

    In this convention, \n is treated by the terminal as \r\n, and so abc\ndefg\rhij\r\n actually becomes abc\r\ndefg\rhij\r\n.

    That's why when the terminal displays it, it prints abc, returns to the beginning of the line, drops to the next line, prints defg, returns to the beginning of the line, does NOT drop to the next line, overwrites the line's first 3 characters with hij, then returns to the beginning of the line, and drops to the next line.

    So to answer your specific questions:

    But doesn't the cursor go to the beginning of the next line when using line feed?

    "cursor" assumes that the device handling the output is a terminal. So the answer depends on how the terminal is configured. The answer is "yes" on Unix terminals. Windows and DOS terminals aren't configured to do that.

    Then why people says that lf only moves the cursor to next line without going to beginning and crlf goes to beginning of the line and then go to the next line.

    Because that's how Windows and DOS terminals treat LF and CRLF.

    When using any programming languages like C to print "Hello\nHELLO", doesn't it automatically prints HELLO in the beginning of next line?

    Ah, now that's the great question. There's a sneaky little rule there:

    When writing to a file, device node, or socket/fifo in text mode, \n is transparently translated to the native newline sequence used by the system, which may be longer than one character. When reading in text mode, the native newline sequence is translated back to \n. In binary mode, no translation is performed, and the internal representation produced by \n is output directly.

    You see, the convention (namely, default configuration) on Unix terminals is to treat \n as \r\n, and so sending Hello\nHELLO from C to a Unix terminal will cause it to also return to beginning of the line when encountering \n.

    On Windows, on the other hand, all these print functions silently translate your input \n to the output \r\n, and the terminal treats each of them separately as you'd expect.

    On old Macs, where newline was \r, \n was silently translated to \r, and the terminal drops to the next line when it returns the cursor to the beginning of the line.

    Then how is there difference between CRLF and LF?

    • LF is the programming language standard for "new line" across all systems whereas CR is an artifact of the program that displays the text, usually a terminal.
    • Some systems (Windows) silently translate LF to CRLF in some situations, but treat CR regularly.
    • Some systems (old Macs) silently translate CR to CRLF in some situations, but treat LF regularly.
    • Some terminals (Unix) treat LF as CRLF, but treat CR regularly.
    • Some terminals (old Macs) treat CR as CRLF, but treat LF regularly.