Search code examples
shellawksedshposix

replacing newlines with the string '\n' with POSIX tools


Yes I know there are a number of questions (e.g. (0) or (1)) which seem to ask the same, but AFAICS none really answers what I want.

What I want is, to replace any occurrence of a newline (LF) with the string \n, with no implicitly assumed newlines... and this with POSIX only utilities (and no GNU extensions or Bashisms) and input read from stdin with no buffering of that is desired.

So for example:

  • printf 'foo' | magic should give foo
  • printf 'foo\n' | magic should give foo\n
  • printf 'foo\n\n' | magic should give foo\n\n

The usually given answers, don't do this, e.g.:

  • awk
    printf 'foo' | awk 1 ORS='\\n gives foo\n, whereas it should give just foo
    so adds an \n when there was no newline.
  • sed
    would work for just foo but in all other cases, like:
    printf 'foo\n' | sed ':a;N;$!ba;s/\n/\\n/g' gives foo, whereas it should give foo\n
    misses one final newline.
    Since I do not want any sort of buffering, I cannot just look whether the input ended in an newline and then add the missing one manually.
    And anyway... it would use GNU extensions.
    sed -z 's/\n/\\n/g'
    does work (even retains the NULs correctly), but again, GNU extension.
  • tr
    can only replace with one character, whereas I need two.

The only working solution I'd have so far is with perl:
perl -p -e 's/\n/\\n/'
which works just as desired in all cases, but as I've said, I'd like to have a solution for environments where just the basic POSIX utilities are there (so no Perl or using any GNU extensions).

Thanks in advance.


Solution

  • The following will work with all POSIX versions of the tools being used and with any POSIX text permissible characters as input whether a terminating newline is present or not:

    $ magic() { { cat -u; printf '\n'; } | awk -v ORS= '{print sep $0; sep="\\n"}'; }
    
    $ printf 'foo' | magic
    foo$
    
    $ printf 'foo\n' | magic
    foo\n$
    
    $ printf 'foo\n\n' | magic
    foo\n\n$
    

    The function first adds a newline to the incoming piped data to ensure that what awk is reading is a valid POSIX text file (which must end in a newline) so it's guaranteed to work in all POSIX compliant awks and then the awk command discards that terminating newline that we added and replaces all others with "\n" as required.

    The only utility above that has to process input without a terminating newline is cat, but POSIX just talks about "files" as input to cat, not "text files" as in the awk and sed specs, and so every POSIX-compliant version of cat can handle input without a terminating newline.