Search code examples
regexcobol

Regex to ignore Cobol comment line


I'd like to use regex to scan a few Cobol files for a specific word but skipping comment lines. Cobol comments have an asterisk on the 7. column. The regex i've gotten so far using a negative lookbehind looks like this:

^(?<!.{6}\*).+?COPY

It matches both lines:

      *     COPY
            COPY

I would assume that .+? overrides the negative lookbehind somehow, but i'm stuck on how to correct this. What would i need to fix to get a regex that only matches the second line?


Solution

  • You may use a lookahead instead of a lookbehind:

    ^(?!.{6}\*).+?COPY
    

    See the regex demo.

    The lookbehind required some pattern to be absent before the start of the string, and thus was redundant, it always returned true. Lookaheads check for a pattern that is to the right of the current location.

    So,

    • ^ - matches the start of the string
    • (?!.{6}\*) - fails the match if there are any 6 chars followed with * from the start of the string (replace . with a space if you need to match just spaces)
    • .+? - matches any 1+ chars, as few as possible, up to the first
    • COPY -COPY substring.