Search code examples
rubyregex

Positive Lookahead and Non-capturing group difference when using gsub


When you want to match either of two patterns but not capture it, you would use a noncapturing group ?::

/(?:https?|ftp)://(.+)/

But what if I want to capture '_1' in the string 'john_1'. It could be '2' or '' followed by anything else. First I tried a non-capturing group:

'john_1'.gsub(/(?:.+)(_.+)/, "")
=> ""

It does not work. I am telling it to not capture one or more characters but to capture _ and all characters after it.

Instead the following works:

'john_1'.gsub(/(?=.+)(_.+)/, "")
=> "john"

I used a positive lookahead. The definition I found for positive lookahead was as follows:

q(?=u) matches a q that is followed by a u, without making the u part of the match. The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign.

But that definition doesn't really fit my example. What makes the Positive Lookahead work but not the Non-capturing group work in the example I provide?


Solution

  • Capturing and matching are two different things. (?:expr) doesn't capture expr, but it's still included in the matched string. Zero-width assertions, e.g. (?=expr), don't capture or include expr in the matched string.

    Perhaps some examples will help illustrate the difference:

    > "abcdef"[/abc(def)/] # => abcdef
    > $1 # => def
    
    > "abcdef"[/abc(?:def)/] # => abcdef
    > $1 # => nil
    
    > "abcdef"[/abc(?=def)/] # => abc
    > $1 # => nil
    

    When you use a non-capturing group in your String#gsub call, it's still part of the match, and gets replaced by the replacement string.