Why does clojure's re-matcher
fail for the same string-pattern combo that re-find
works for?
Regex Pattern: #"\[(?<level>[A-Z]*)\]:\s(?<msg>.*)"
Example String: [WARNING]: \tTimezone not set \r\n
Below is an example on the console with above pattern & string and another string that works with both re-find
and re-matcher
(with the same pattern).
user=> (def s1 "[ERROR]: This is an error.")
#'user/s1
user=> (def s2 "[WARNING]: \tTimezone not set \r\n")
#'user/s2
user=> (def rx #"\[(?<level>[A-Z]*)\]:\s(?<msg>.*)")
#'user/rx
user=> (re-find rx s1)
["[ERROR]: This is an error." "ERROR" "This is an error."]
user=> (re-find rx s2)
["[WARNING]: \tTimezone not set " "WARNING" " \tTimezone not set "]
user=> (def m1 (re-matcher rx s1))
#'user/m1
user=> (def m2 (re-matcher rx s2))
#'user/m2
user=> (.matches m1)
true
user=> (.matches m2)
false
As you can see from the code snippet, re-find
works on string s2
, however re-matcher
's matches
method returns false
for the same string-pattern combination.
I read that re-find
uses the Matcher
methods behind the scenes (ref), so what am I missing here?
user> (def s1 "[ERROR]: This is an error.")
#'user/s1
user> (def s2 "[WARNING]: \tTimezone not set \r\n")
#'user/s2
user> (def rx #"\[(?<level>[A-Z]*)\]:\s(?<msg>(?s:.)*)")
#'user/rx
user> (re-find rx s1)
["[ERROR]: This is an error." "ERROR" "This is an error."]
user> (re-find rx s2)
["[WARNING]: \tTimezone not set \r\n"
"WARNING"
" \tTimezone not set \r\n"]
user> (def m1 (re-matcher rx s1))
#'user/m1
user> (def m2 (re-matcher rx s2))
#'user/m2
user> (.matches m1)
true
user> (.matches m2)
true
user> (re-groups m2)
["[WARNING]: \tTimezone not set \r\n"
"WARNING"
" \tTimezone not set \r\n"]
As Wiktor has pointed out, the .
in the regex does not match the EOL (end-of-line) markers; and re-find
finds any substring match, whereas .matches
on re-matcher
requires a full string match. So, if you use (?s:....)
to enable "single-line mode" you get the regex that does what you intended. See the regex change above.
But note that this regex includes the EOL chars in the <msg>
group. If you do not want to include the EOL chars in the <msg>
group, then change the regex to
user> (def rx1 #"\[(?<level>[A-Z]*)\]:\s(?<msg>.*)(?s:.*)")
#'user/rx1
user> (re-find rx1 s1)
["[ERROR]: This is an error." "ERROR" "This is an error."]
user> (re-find rx1 s2)
["[WARNING]: \tTimezone not set \r\n"
"WARNING"
" \tTimezone not set "]
user> (def m3 (re-matcher rx1 s1))
#'user/m3
user> (def m4 (re-matcher rx1 s2))
#'user/m4
user> (.matches m3)
true
user> (.matches m4)
true
user> (re-groups m4)
["[WARNING]: \tTimezone not set \r\n"
"WARNING"
" \tTimezone not set "]