Search code examples
rubyregexword-boundary

Regexp union with word boundaries


I have a list of patterns and I want to match a string against those patterns but I need to match only entire words, so I was looking for a way to dynamically insert word boundaries into the Regexp.union method but I am missing something. Here is what I have tried

test_string = "lonewolf is lonely"
pattern_list = ["lonely", "wolf", "jungle"]
pattern_list.collect! { |pattern| pattern = "\b" + pattern + "\b"}
patterncollection = Regexp.union(pattern_list)
puts patterncollection
puts test_string.scan(patterncollection)

Results are empty and if I print the pattern collection I see that "\b" doesn't get escaped correctly. I cannot insert the "\b" directly in the array as that list gets dynamically retrieved. I have tried more than one option but still no luck. Different approaches to the problem are welcome.


Solution

  • The easiest solution would be to move word boundary matchers outside of the union:

    /\b(#{Regexp.union(pattern_list).source})\b/
    
    ▶ "lonewolf is lonely".scan /\b(#{Regexp.union(%w|lonely wolf jungle|).source})\b/
    #⇒ [
    #    [0] [
    #        [0] "lonely"
    #    ]
    #  ]
    

    Please also refer to the significant comment below. Basically, it suggests to “Use source unless you are absolutely positive you know what will happen. – the Tin Man”.

    I updated the answer accordingly.