Search code examples
rubyregexlookbehind

Issue with a Look-behind Regular expression (Ruby)


I wrote this regex to match all href and src links in an HTML page; (I know I should be using a parser; this just experimenting):

/((href|src)\=\").*?\"/ # Without look-behind

It works fine, but when I try to modify the first portion of the expression as a look-behind pattern:

/(?<=(href|src)\=\").*?\"/ # With look-behind

It throws an error stating 'invalid look-behind pattern'. Any ideas, whats going wrong with the look-behind?


Solution

  • Lookbehind has restrictions:

       (?<=subexp)        look-behind
       (?<!subexp)        negative look-behind
    
                          Subexp of look-behind must be fixed character length.
                          But different character length is allowed in top level
                          alternatives only.
                          ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
    
                          In negative-look-behind, captured group isn't allowed, 
                          but shy group(?:) is allowed.
    

    You cannot put alternatives in a non-top level within a (negative) lookbehind.

    Put them at the top level. You also don't need to escape some characters that you did.

    /(?<=href="|src=").*?"/