I'm trying to write regex that will split all words starting with hashtag.
For example in following text it should :
val regex = "???".r
val text = "#shouldMatch1 #shouldMatch2 notMatch nope#shouldMatch3 nooope()#shouldMatch4"
regex.split(text).toList shouldBe List("#shouldMatch1", "#shouldMatch2", "#shouldMatch3", "#shouldMatch4")
The closes that I could get is val regex: Regex = "[^#\\w+]".r
, but it splits a litte bit more:
List("#shouldMatch1", "#shouldMatch2", "notMatch", "nope#shouldMatch3", "nooope", "#shouldMatch4")
So in some cases it finds words that do not start with hashtag. Do you have any idea or guidance how I should write proper expression?
Code was written in Scala but should the similar in Java.
You need to use findAllIn
with a regex like #\w+
:
val regex = """#\w+""".r
val text = "#shouldMatch1 #shouldMatch2 notMatch nope#shouldMatch3 nooope()#shouldMatch4"
println(regex.findAllIn(text).toList)
See the Scala demo.
The hashtag matching pattern can be different, there are a lot of variations. Here are some of them:
#\w+
- if the hashtags can contain only word chars#[\w-]+
- if the hashtags can contain only word and hyphen chars#\S+
- if the hashtags contain any amount of one or more non-whitespace chars after #
#\S+\b
- if the hashtags contain any amount of one or more non-whitespace chars after #
but you want it to stop before the final sequence of non-word chars (like a comma, etc)(?<!\S)#\S+
- if the hashtags contain any amount of one or more non-whitespace chars after #
, but before #
, there can only be whitespace or start of string.