I'm trying to define a proper regular expression that will validate a field.
The field is 26 chars long and can have: any letter (lower or uppercase), whitespaces ( ), commas (,), hyphens (-) or forward slashes (/).
The program should:
Identify whether or not there's an improper char in field $3 via if ( $3 !~ /regex/ ). If that is the case, show the improper chars (in this case: $ and *) via the showChars() function.
Current code:
awk '
function showChars(fieldIn) {
split(fieldIn,chars,"")
for ( i=1; i<=length(chars); i++ ) {
if (chars[i] !~ regex) {
print "Invalid char found:" chars[i]
}
}
}
BEGIN {
FS=""
FIELDWIDTHS="4 4 26"
regex="[a-zA-Z/, \t-]$"
}
{
if ( $3 !~ /regex/ ) {
print "Line " NR ": Problem in field"
print "$3:"$3
showChars($3)
next
} else {
print "Line " NR ": OK"
next
}
}
' $filename
This particular code, enters the if in every case but showChars() doesn't always show invalid chars, makes me wonder why it entered the if in the first place.
Example of input filename: Invalid field: !!!!----JOHN,DOE/-SMITH $* (end of line after 4+4+26 char fields) Valid field: !!!!----ANA,DE/LACROIX (end of line after 4+4+26 char fields)
filename:
!!!!----JOHN,DOE/-SMITH $*
!!!!----ANA,DE/LACROIX
This may be what you're trying to do, using GNU for various extensions:
awk '
function showchar(fieldIn, chars,numChars,i) {
numChars = split(fieldIn,chars,"")
for ( i=1; i <= numChars; i++ ) {
if ( chars[i] !~ chrRegex ) {
print "Invalid char found:" chars[i]
}
}
}
BEGIN {
FIELDWIDTHS="4 4 26"
chrRegex = "[[:alpha:][:space:],/-]"
fldRegex = "^(" chrRegex "){26}$"
}
{
if ( $3 ~ fldRegex ) {
print "Line " NR ": OK"
}
else {
print "Line " NR ": Problem in field"
print "$3:"$3
showchar($3)
}
}
' "$filename"
Your showchar()
function could just be this though:
print "Invalid char(s) found:", gensub(chrRegex,"","g",$3)
e.g.
$ cat tst.sh
#!/usr/bin/env bash
filename="$1"
awk '
BEGIN {
FIELDWIDTHS="4 4 26"
chrRegex = "[[:alpha:][:space:],/-]"
strRegex = "^" chrRegex "{26}$"
}
{
if ( $3 !~ strRegex ) {
print "Line " NR ": Problem in field"
print "$3:"$3
print "Invalid char(s) found:", gensub(chrRegex,"","g",$3)
} else {
print "Line " NR ": OK"
}
}
' "$filename"
$ ./tst.sh file
Line 1: Problem in field
$3:JOHN,DOE/-SMITH $*
Invalid char(s) found: $*
Line 2: OK
Don't write negative conditions if you can avoid it by the way. You had:
if ( !whatever ) {
do_foo
}
else {
do_bar
}
so ask yourself - under what condition do I call do_bar
? It's "If it is NOT true that NOT whatever is true" - an inscrutable double negative. Just avoid using !
s or other negative logic to keep your code clear and simple:
if ( whatever ) {
do_bar
}
else {
do_foo
}