Search code examples
c++regexqregularexpression

Match an expression but don't match lines that start with a #


I am us Qt. I have a text string that I specifically look for a function call xyz.set_name(), I want to capture the last occurrence of this call but negate it if the line that contains it starts with a #. So far I got the regex to match the function call but I don't know how to negate the # matched lines and I don't know how to capture the last occurrence, don't know why all the matches are put into one capture group.

[().\w\d]+.set_name\(\)\s*

This is what I want it to do

abc.set_name() // match
# abc.set_name() // don't match
xyz.set_name() // match and capture this one

Update for more clarification:

My text read like this when printed out with qDebug

Hello\nx=y*2\nabc.set_name()   \n#xyz.set_name()

It's is a long string with \n being as newline.

Update: a longer test string for test. I have tried all the suggested regex on this but they didn't work. Don't know what is missing. https://regex101.com/r/vXpXIA/1

Update 2: Scratch my first update, the \n is a qDebug() thing, it doesn't need to be considered when using regex.


Solution

  • If you merely want to match the last line that matches the pattern

    ^[a-z]+\.set_name\(\)
    

    you can use the regular expression.

    (?smi)^[a-z]+\.set_name\(\)(?!.*^[a-z]+\.set_name\(\))
    

    For simplicity I've used the character class [a-z]. That can be changed to suit requirements. In the question it is [().\w\d], which can be simplified to [().\w].

    Note that since the substring of interest is being matched there is no point to capturing it as well. The fact that one of the lines prior to the last one begins with '#' is not relevant. All that matters is whether the lines match a specified pattern.

    Start your engine!

    The PCRE regex engine performs the following operations.

    (?smi)                  : set single-line, multi-line and case-indifferent
                              modes  
    ^                       : match the beginning of a line
    [a-z]+\.set_name\(\)    : match 1+ chars in the char class, followed
                              by '.set_name\(\)'
    (?!                     : begin negative-lookahead
    .*^[a-z]+\.set_name\(\) : match 0+ chars (including newlines), the  
                              beginning of a line, 1+ letters, '\.set_name\(\)' 
    )                       : end negative lookahead
    

    Recall that single-line mode causes . to match newlines and multi-line mode causes ^ and $ to match the beginning and ends of lines (rather than the beginning and end of the string).