Search code examples
shellunixnawk

Function call in Nawk command


I am not sure how to achieve the function call in the nawk command. I have given the scope what i want want from the input and output. The function should validate the column 3 and return true or false. If the column satisfies the condition then it should go to good file; if not go to bad file. Can you help me in modifying the nawk command to achieve my functionality?

I know we can achieve the length validation in single statement but my validate function is just the sample code. I want to achieve more that length check in the validate function.

input.txt:

1 | I | 123  | KK
3 | U | 3456 | JJ
6 | B | 241  | YH

outputgood.txt:

3 | U | 3456 | JJ

outputbad.txt:

1 | I | 123  | KK
6 | B | 241  | YH

Script:

#!/bin/sh
#function validation

function validate(){
in = $1
if length(in) > 3
  return true
else
 return false
}

nawk -F '|' 'function validate($3){print}' input.txt > outputgood.txt

Solution

  • First off, you've got a shell function which you're trying to call from within your awk script. That can't work.

    If your validation must be in shell, then perhaps you can do the whole thing in shell.

    #!/bin/sh
    
    while read line; do
    
      var=${line#* | }                              # take off first field
      var=${var#* | }                               # take off second field
      var=${var% | *}                               # take off fourth field
      var=`expr "$var" : "^\ *\(.*[^ ]\)\ *$"`      # trim whitespace
    
      if [ ${#var} -gt 3 ]; then
        echo "$line" >> outputgood.txt
      else
        echo "$line" >> outputbad.txt
      fi
    
    done < input.txt
    

    We're splitting the line using parameter tools because $IFS doesn't let us use variable amounts of whitespace. You could alternatively still do this using positional parameters, potentially giving you easier access to other fields as well. Note that you still need to trim, if you're using field length as a condition.

    #!/bin/sh
    
    IFS="|"
    while read line; do
    
      set -- $line
      var=`expr "$3" : "^\ *\(.*[^ ]\)\ *$"`        # trim whitespace
    
      if [ ${#var} -gt 3 ]; then
        echo "$line" >> outputgood.txt
      else
        echo "$line" >> outputbad.txt
      fi
    
    done < input.txt
    

    If what you're really interested in is whether the third field is greater than 1000, then that would be a better thing to test for than the length of the field. Clarity in programming is like clarity in anything else. Don't obfuscate if you can avoid it..

    Note that we could do this with a little less code in bash, but your question just specified "shell" so I'm assuming /bin/sh.