Search code examples
regexpcre

Match a pattern not preceded by a quotation mark


I have this pattern (?<!')(\w*)\((\d+|\w+|.*,*)\) that is meant to match strings like:

  • c(4)
  • hello(54, 41)

Following some answers on SO, I added a negative lookbehind so that if the input string is preceded by a ', the string shouldn't match at all. However, it still partially matches.

For example:

'c(4) returns (4) even though it shouldn't match anything because of the negative lookbehind.

How do I make it so if a string is preceded by ' NOTHING matches?


Solution

  • Since nobody came along, I'll throw this out to get you started.

    This regex will match things like

    aa(a , sd,,,f,)
    aa( as , " ()asdf)) " ,, df, , )
    asdf()

    but not

    'ab(s)

    This will fix the basic problem (?<!['\w])\w*
    Where (?<!['\w]) will not let the engine skip over a word char just
    to satisfy the not quote.
    Then the optional words \w* to grab all the words.
    And if a 'aaa( quote is before it, then it won't match.

    This regex here embellishes what I think you are trying to accomplish
    in the function body part of your regex.
    It might be a little overwhelming to understand at first.

    (?s)(?<!['\w])(\w*)\(((?:,*(?&variable)(?:,+(?&variable))*[,\s]*)?)\)(?(DEFINE)(?<variable>(?:\s*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')\s*|[^()"',]+)))

    Readable version (via: http://www.regexformat.com)

     (?s)                          # Dot-all modifier
    
     (?<! ['\w] )                  # Not a quote, nor word behind
                                   # <- This will force matching a complete function name
                                   #    if it exists, thereby blocking a preceding quote '
    
     ( \w* )                       # (1), Function name (optional)
     \(
     (                             # (2 start), Function body
          (?:                           # Parameters (optional)
               ,*                            # Comma (optional)
               (?&variable)                  # Function call, get first variable (required)
               (?:                           # More variables (optional)
                    ,+                            # Comma  (required)
                    (?&variable)                  # Variable (required)
               )*
               [,\s]*                        # Whitespace or comma (optional)
          )?                            # End parameters (optional)
     )                             # (2 end)
     \)
    
     # Function definitions
     (?(DEFINE)
          (?<variable>                  # (3 start), Function for a single Variable
               (?:
                    \s* 
                    (?:                           # Double or single quoted string
                         "                            
                         [^"\\]* 
                         (?: \\ . [^"\\]* )*
                         "
                      |  
                         '                      
                         [^'\\]* 
                         (?: \\ . [^'\\]* )*
                         '
                    )
                    \s*     
                 |                              # or,
                    [^()"',]+                     # Not quote, paren, comma (can be whitespace)
               )
          )                             # (3 end)
     )