Search code examples
macosshellunixawkline-endings

how to use awk to do a number comparisons and create a list - using Awk on macOS with CRLF line endings


I'm trying to list values greater than my value listed (output from my knife command). I am trying to do this using awk, i've been researching examples and came up with this. However, my expected output does not work.

For example with this command, I get the following output:

knife ssh -x foobar -a ec2.local_ipv4 "chef_environment:prod AND roles:db_cluster AND AND ipaddress:10.1.*" 'netstat -na | grep EST | wc -l'

Output:

10.1.3.129 2273
10.1.3.130 2533
10.1.3.131 1981
10.1.2.133 1965

Now, I want to use awk because I want to filter only those values (2nd column, remove IPs) > 2000.

I tried the following awk statement, but to no avail

knife ssh -x foobar -a ec2.local_ipv4 "chef_environment:prod AND roles:db_cluster AND AND ipaddress:10.1.*" 'netstat -na | grep EST | wc -l' \
| awk '{if ($2 > 2000) print $2; else echo "Nothing to print"}`

Output:

10.1.3.129 2273
10.1.3.130 2533
10.1.3.131 1981
10.1.2.133 1965

Expected output:

2273
2533

Solution

  • tl;dr

    The simplest approach is to remove \r instances from the output before passing it to awk:

    knife ... | tr -d '\r' | awk ...
    

    This assumes that \r instances only occur as part of \r\n pairs to designate line endings, which is generally the case.


    From your comments, we now know that your input has Windows-style CRLF (\r\n) line endings and that you're on macOS Sierra (10.12).

    That said, your sample output is inconsistent with the awk command in your question.

    Leaving that issue aside, there are two basic approaches:

    • (a) Translate \r\n (CRLF) sequences to just \n (LF) first.

    • (b) Work around the issue by modifying Awk's input-record separator.


    The following examples use simplified input and a simplified command to focus on the core issue:

    • printf '10.1.3.129 2273\r\n10.1.3.130 2533\r\n' is used to produce 2 CRLF- terminated (\r\n-terminated) input lines containing 2 whitespace-separated fields each.

    • awk '{ print $2 }' | cat -e - or a variations thereof - prints the 2nd whitespace-separated field from each line using awk, and cat -e is used to visualize control characters in the output: $ represents a \n (LF) char. (the end of the line in Unix terms), and other control characters are visualized as ^<letter>, i.e., in caret notation; therefore, \r (CR) is represented as ^M.

      • By default, the \r would be included in the output, because awk doesn't consider it whitespace (which the lines are split into fields by) - which is clearly undesired. The output would look as follows, where ^M indicates the unwanted inclusion of \r:

        2273^M$
        2533^M$
        
      • With an effective solution, the \r would not be included in the output, and the output would look as follows (note the absence of ^M):

        2273$
        2533$
        

    Solutions based on approach (a):

    Most typically, utility dos2unix is used to translate Windows-style line breaks to Unix-style ones, but that utility doesn't come with macOS.
    It's easy to install it via Homebrew, however.
    Then use knife ... | dos2unix | awk ....
    (Alternatively send output to a file first and update that file in-place before further processing: dos2unix file.)

    Alternatively, brought to you by the Shameless Self-Promotion Department, you can install my nws CLI; if you have Node.js installed, install it by simply running [sudo] npm install -g nws-cli and then use knife ... | nws --lf | awk ....
    (Alternatively, send output to a file first and update that file in-place before further processing:
    nws --lf -i file; nws can also translate from LF to CRLF and offers other whitespace-related functions.)

    There are also fairly simple ways to use stock macOS utilities - see this answer of mine.

    The simplest solution with stock utilities is to use tr to blindly remove any \r instances:

    $ printf '10.1.3.129 2273\r\n10.1.3.130 2533\r\n' |
        tr -d '\r' | awk '{ print $2 }' | cat -e
    2273$
    2533$
    

    Solution based on approach (b):

    $ printf '10.1.3.129 2273\r\n10.1.3.130 2533\r\n' |
        awk -v RS='\r' 'NF {print $2}' | cat -e
    2273$
    2533$
    

    Note how -v RS='\r' defines \r as RS, the input-record separator, which means that it is automatically excluded from each record (line) that awk reads and splits into fields.

    NF, placed as a condition before the action ({...}) is necessary to eliminate the empty line that results from reading the final \n as a separate record.

    • This could be avoided if we could define RS as \r\n, but, sadly, the BSD Awk on macOS doesn't support multi-character input-record separators (in line with the POSIX spec.).
      Via Homebrew, however, you could install GNU Awk, which does support such separators, which would simplify the command to:
      gawk -v RS='\r\n' '{print $2}'