Parsing the following with awk
:
$> df -h /
Filesystem Size Used Avail Use% Mounted on
rootfs 476G 370G 106G 78% /
If I use an explicit match for the G's on the values, it works as expected:
$> awk -v indrive="/dev/sda1" 'NR!=1{gsub(/G/,""); print $2,$4,indrive}' <(df -h /)
476 106 /dev/sda1
However, if I genericize it w/a char class:
awk -v indrive="/dev/sda1" 'NR!=1{gsub(/[[:alpha:]]/,""); print $2,$4,indrive}' <(df -h /)
370 78% /dev/sda1
Not sure where 370 and 78% are coming from.
Update: I actually get the same from:
awk -v indrive="/dev/sda1" 'NR!=1{gsub(/[a-zA-Z]/,""); print $2,$4,indrive}' <(df -h /)
370 78% /dev/sda1
But with [[:upper:]]
it seems to work fine:
awk -v indrive="/dev/sda1" 'NR!=1{gsub("([[:upper:]])*",""); print $2,$4,indrive}' <(df -h /)
476 106 /dev/sda1
It seems like this is what you're trying to do, using any awk
(and cat file
in place of your df
command for demoing):
$ cat file |
awk -v indrive='/dev/sda1' 'NR>1{$0=$2 FS $4; gsub(/[[:alpha:]]/,""); print $0, indrive}'
476 106 /dev/sda1
or this with GNU awk for gensub()
:
$ cat file |
awk -v indrive='/dev/sda1' 'NR>1{print gensub(/[[:alpha:]]/,"","g",$2 FS $4), indrive}'
476 106 /dev/sda1
Your code was applying the gsub()
across the whole line and so removing $1
and re-splitting $0
into different fields while the above is selecting the input fields first, then doing gsub()
on just them.
Regarding:
Not sure where 370 and 78% are coming from.
They're the 3rd and 5th fields from your input after G is removed:
Filesystem Size Used Avail Use% Mounted on
rootfs 476G 370G 106G 78% /
^^^ ^^^
Regarding:
gsub(/[[:alpha:]]/,"")
......Update: I actually get the same from...
gsub(/[a-zA-Z]/,"")
...
The character ranges a-z
and A-Z
together cover the same set of alphabetic characters present in your input as [:alpha:]
does. In some locales they're identical sets of characters.
Regarding your comment:
I thought it was just going to apply gsub to $0 and leave the fields intact if possible. Still not sure why [:upper:] and G alone work as expected but I guess that's another question.
[[:upper:]]
(the set of all upper case letters) and G
worked because they only match the G
s you want to remove from your input while [[:alpha:]]
matches each of the 6 characters in rootfs
(lower case letters) in addition to the G
s and so it also removed that whole first field.