I try to use awk
to filter some rows based on whether a given column contains an integer, floating point or empty string.
I have problems with the integers or the way to add them to the rule but avoiding adding weird expressions like 2.e
.
220802,2249,3
220802,2250,37.5
220802,2251,2.e
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2255,3.a
220802,2256,123.25
220802,2257,1.32e-8
220802,2258,N/A
220802,2249,3
220802,2250,37.5
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2256,123.25
220802,2257,1.32e-8
I try this, but fails:
awk -F , '$3 ~ /(^$|[0-9]+\.?[0-9]+e?[-+]?[0-9]?+)/' INPUT
220802,2250,37.5
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2256,123.25
220802,2257,1.32e-8
Also try this:
awk -F , '$3 ~ /(^$|[0-9]+|[0-9]+\.?[0-9]+e?[-+]?[0-9]?+)/'
220802,2249,3
220802,2250,37.5
220802,2251,2.e
220802,2252,28
220802,2253,
220802,2254,19.5
220802,2255,3.a
220802,2256,123.25
220802,2257,1.32e-8
Any aproach to consider? Thanks in advance!
Checking for empty is easy. You could use $3~/^$/
(as in your regex) but in this case, as you will also test if the field is a number, it's even shorter to logically "not" it, making use of the fact that only empty string and numbers equal to zero (0
, 0.0
, -0
, etc) are false.
To test if awk considers a field (or variable) to be a number, you can compare it to itself plus zero.
So:
awk -F, '$3~/^$/ || ($3+0 == $3)' INPUT
or:
awk -F, '!$3 || ($3+0 == $3)' INPUT
The reason your regex didn't work is that you didn't anchor it. For example, [0-9]+
matches if there is a digit, even when it is embedded in something that is not a number.
Matching all valid representations of number is quite error-prone.