Search code examples
regexgreppcregrep

match a floating point number that is not inside parenthesis


I'm trying to match a pattern in a bunch of files with grep. The files contain G-code (CNC machine code). Every number should have a letter associated with it (example: X4.5, G71, Z-0.75). Many files have typos and are missing the letters. I'm trying to use grep to identify these files by matching any decimal numbers in the file that are not immediately preceded by a letter. However I do not want to match the same pattern if the pattern occurs within parenthesis. Anything in parenthesis is a comment and should not be matched by the regex.

test text:

%
O01934 (AWC C011469)
(MATL: 4.0 X 2.0 X A020)
N90 G00 4.2 z0.1
Z0.1125 F0.004 
N150 X2.2 .01 (inline comment)
0.03

Line 3 technically contains the pattern I'm looking for but I don't want to match it because it's within parenthesis.

Lines 4, 6, 7 are examples of the pattern I'm trying to match. Numbers not preceded by a letter and not inside of parenthesis.

I've been on regextester.com for well over an hour and I've got a headache now. Maybe someone more seasoned with regex can help.

The best pattern I could figure out is ([[:space:]]|^)-?[[:digit:]]*\.[[:digit:]]+([[:space:]]|$). Which matches what I want on 4, 6, and 7. But also matches the numbers in the comment on line 3. I can't figure out how to match one but not the other.


Solution

  • Your regex can be fixed and used as

    pcregrep -o '\([^()]*\)(*SKIP)(*F)|(?<!\S)-?\d*\.\d+(?!\S)' file
    

    The \([^()]*\)(*SKIP)(*F) part matches any substring inside closest parentheses and omits this match, thus ignoring any possible matches inside parentheses.

    If you need to only avoid matches after a letter replace (?<!\S) with (?<!\p{L}).