Search code examples
awkgnugawkcharacter-class

Regex "^[[:digit:]]$" not working as expected in AWK/GAWK


My GAWK version on RHEL is:

gawk-3.1.5-15.el5

I wanted to print a line if the first field of it has all digits (no special characters, even space to be considered)

Example:

echo "123456789012345,3" | awk -F, '{if ($1 ~ /^[[:digit:]]$/)  print $0}'

Output:
Nothing

Expected Output:
123456789012345,3

What is going wrong here ? Does my AWK version not understand the GNU character classes ? Kindly help


Solution

  • To match multiple digits in the the [[:digit:]] character class add a +, which means match one or more number of digits in $1.

    echo "123456789012345,3" | awk -F, '{if ($1 ~ /^([[:digit:]]+)$/)  print $0}'
    123456789012345,3
    

    which satisfies your requirement.

    A more idiomatic way ( as suggested from the comments) would be to drop the print and involve the direct match on the line and print it,

    echo "123456789012345,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
    123456789012345,3
    

    Some more examples which demonstrate the same,

    echo "a1,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
    

    (and)

    echo "aa,3" | awk -F, '$1 ~ /^([[:digit:]]+)$/'
    

    do NOT produce any output a per the requirement.

    Another POSIX compliant way to do strict length checking of digits can be achieved with something like below, where {3} denotes the match length.

    echo "123,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'
    123,3
    

    (and)

    echo "12,3" |  awk --posix -F, '$1 ~ /^[0-9]{3}$/'
    

    does not produce any output.

    If you are using a relatively newer version of bash shell, it supports a native regEx operator with the ~ using POSIX character classes as above, something like

    #!/bin/bash
    
    while IFS=',' read -r row1 row2
    do
       [[ $row1 =~ ^([[:digit:]]+)$ ]] && printf "%s,%s\n" "$row1" "$row2"
    done < file
    

    For an input file say file

    $ cat file
    122,12
    a1,22
    aa,12
    

    The script produces,

    $ bash script.sh
    122,12
    

    Although this works, bash regEx can be slower a relatively straight-forward way using string manipulation would be something like

    while IFS=',' read -r row1 row2
    do
       [[ -z "${row1//[0-9]/}" ]] && printf "%s,%s\n" "$row1" "$row2"
    done < file
    

    The "${row1//[0-9]/}" strips all the digits from the row and the condition becomes true only if there are no other characters left in the variable.