Search code examples
regexrubyregex-lookaroundsgsubregexp-replace

SyntaxError: (irb):4: invalid pattern in look-behind (positive look-behind/ahead)


I am trying to write a regex-replace pattern in order to replace a number in a hash like such:

regexr link

some_dict = {
  TEST: 123
}

such that 123 could be captured and replaced.

(?<= |\t*[a-zA-Z0-9_]+: |\t+)\d+(?=.*)

You'll see that this works perfectly fine in regexr: enter image description here

When I run this gsub in irb, however, here is what happens:

irb(main):005:0> "  TEST: 123".gsub(/(?<= |\t*[a-zA-Z0-9_]+: |\t+)\d+(?=.*)/, "321")
SyntaxError: (irb):5: invalid pattern in look-behind: /(?<= |\t*[a-zA-Z0-9_]+: |\t+)\d+(?=.*)/

I was looking around for similar issues like Invalid pattern in look-behind but I made sure to exclude capture groups in my look-behind so I'm really not sure where the problem lies.


Solution

  • The reason is that Ruby's Onigmo regex engine does not support infinite-width lookbehind patterns.

    In a general case, positive lookbehinds that contain quantifiers like *, + or {x,} can often be substituted with a consuming pattern followed with \K:

    /(?: |\t*[a-zA-Z0-9_]+: |\t+)\K\d+(?=.*)/
    #^^^                         ^^  
    

    However, you do not even need that complicated pattern. (?=.*) is redundant, as it does not require anything, .* matches even an empty string. The positive lookbehind pattern will get triggered if there is a space or tab immediately to the left of the current location. The regex is equal to

    .gsub(/(?<=[ \t])\d+/, "321")
    

    where the pattern matches

    • (?<=[ \t]) - a location immediately preceded with a space/tab
    • \d+ - one or more digits.