Search code examples
shellperlawksedposix

Count trailing newlines with POSIX utilities or GNU coreutils or Perl


I'm looking for ways to count the number of trailing newlines from possibly binary data either:

  • read from standard input
  • or already in a shell variable (then of course the "binary" excludes at least 0x0) using POSIX or coreutils utilities or maybe Perl.

This should work without temporary files or FIFOs.

When the input is in a shell variable, I already have the following (possibly ugly but) working solution:

original_string=$'abc\n\n\def\n\n\n'
string_without_trailing_newlines="$( printf '%s' "${original_string}" )"
printf '%s' $(( ${#original_string}-${#string_without_trailing_newlines} ))

which gives 3 in the above example.

The idea above is simply to subtract the string lengths and use the "feature" of command substitution that it discards any trailing newlines.

Test-Cases:

printf ''             |  function   results in: 0
printf '\n'           |  function   results in: 1
printf '\n\n'         |  function   results in: 2
printf '\n\n\n'       |  function   results in: 3
printf 'a'            |  function   results in: 0
printf 'a\n'          |  function   results in: 1
printf 'a\n\n'        |  function   results in: 2
printf '\na\n\n'      |  function   results in: 2
printf 'a\n\nb\n'     |  function   results in: 1

For the special cases when NUL is part of the string (which anyway just works when reading from stdin, not when giving the string in the shell via avariable), the results are undefined but should typically be either:

printf '\n\x00\n\n'   |  function   results in: 1
printf 'a\n\n\x00\n'  |  function   results in: 2

that is counting the new lines up to the NUL

or:

printf '\n\x00\n\n'   |  function   results in: 2
printf 'a\n\n\x00\n'  |  function   results in: 1

that is counting the newlines from the NUL

or:

printf '\n\x00\n\n'   |  function   results in: 3
printf 'a\n\n\x00\n'  |  function   results in: 3

that is ignoring any "trailing" NUL, as long as these are right before, within or right after the trailing NULs

or:
giving an error


Solution

  • Using GNU awk for RT and without reading all of the input into memory at once:

    $ printf 'abc\n\n\def\n\n\n' | awk '/./{n=NR} END{print NR-n+(n && (RT==RS))}'
    3
    
    $ printf 'a\n' | awk '/./{n=NR} END{print NR-n+(n && (RT==RS))}'
    1
    
    $ printf 'a' | awk '/./{n=NR} END{print NR-n+(n && (RT==RS))}'
    0
    
    $ printf '' | awk '/./{n=NR} END{print NR-n+(n && (RT==RS))}'
    0
    
    $ printf '\n' | awk '/./{n=NR} END{print NR-n+(n && (RT==RS))}'
    1
    
    $ printf '\n\n' | awk '/./{n=NR} END{print NR-n+(n && (RT==RS))}'
    2