def get_hashtags(post)
tags = []
post.scan(/(?<![0-9a-zA-Z])(#+)([a-zA-Z]+)/){|x,y| tags << y}
tags
end
Test.assert_equals(get_hashtags("two hashs##in middle of word#"), [])
#Expected: [], instead got: ["in"]
Should it not look behind to see if the match doesnt begin with a word or number? Why is it still accepting 'in' as a valid match?
You should use \K
rather than a negative lookbehind. That allows you to simplify your regex considerably: no need for a pre-defined array, capture groups or a block.
\K
means "discard everything matched so far". The key here is that variable-length matches can precede \K
, whereas (in Ruby and most other languages) variable-length matches are not permitted in (negative or positive) lookbehinds.
r = /
[^0-9a-zA-Z#] # do not match any character in the character class
\#+ # match one or more pound signs
\K # discard everything matched so far
[a-zA-Z]+ # match one or more letters
/x # extended mode
Note #
in \#+
need not be escaped if I weren't writing the regex in extended mode.
"two hashs##in middle of word#".scan r
#=> []
"two hashs&#in middle of word#".scan r
#=> ["in"]
"two hashs#in middle of word&#abc of another word.###def ".scan r
#=> ["abc", "def"]