so I am creating an WML like language for my assignment and as a first step, I am supposed to create regular expressions to recognize the following:
//single = "{"
//double = "{{"
//triple = "{{{"
here is my code for the second one:
val double = "\\{\\{\\b".r
and my Test is:
println(double.findAllIn("{{ s{{ { {{{ {{ {{x").toArray.mkString(" "))
Bit it doesn't print anything ! It's supposed to print the first, second, fifth and 6th token. I have tried every single combination of \b and \B and even \{{2,2} instead of \{\{ but it's still not working. Any help??
As a side question, If I wanted it to match just the first and fifth tokens, what would I need to do?
I tested your code (Scala 2.12.2 REPL), and in contrary to your "it doesn't print anything" statement, it actually prints "{{" occurrence from "{{x" substring.
This is because x
is a word character and \b
matches a position between second {
and x
. Keep in mind that {
isn't a word character, unlike x
.
As per this tutorial
It matches at a position that is called a "word boundary". This match is zero-length
There are three different positions that qualify as word boundaries:
1) Before the first character in the string, if the first character is a word character
...
As for solution, it depends on precise definition, but lookarounds seemed to work for me:
"(?<!\\{)\\{{2}(?!\\{)".r
It matched "first, second, fifth and 6th token". The expression says match "{{" not preceded and not followed by "{".
For side-question:
"(?<![^ ])\\{\\{(?![^ ])".r //match `{` surrounded by spaces or line boundaries
Or, depending on your interpretation of "space":
"(?<!\\S)\\{\\{(?!\\S)".r
matched 1st and 5th tokens. I couldn't use positive lookarounds coz I wanted to take line beginnings and endings (boundaries) into account automatically. So double negation by !
and [^ ]
created an effect of implicit inclusion of ^
and $
. Alternatively, you could use:
"(?<=^|\\s)\\{\\{(?=\\s|$)".r
You can read about lookarounds here. Basically they match the symbol or expression as boundary; simply saying they match stuff but don't include it in the matched string itself.
Some examples of lookarounds
(?<=z)aaa
matches "aaa" that is preceded by z
(?<!z)aaa
matches "aaa" that is not preceded by z
aaa(?=z)
matches "aaa" followed by z
aaa(?!z)
matches "aaa" not followed by z
P.S. Just to make your life easier, Scala has """
for escaping, so let's say instead of:
"(?<!\\S)\\{\\{(?!\\S)".r
you can just:
"""(?<!\S)\{\{(?!\S)""".r