Search code examples

Clojure re-find works but re-matcher fails for the same string-pattern combination

Why does clojure's re-matcher fail for the same string-pattern combo that re-find works for?

Regex Pattern: #"\[(?<level>[A-Z]*)\]:\s(?<msg>.*)"

Example String: [WARNING]: \tTimezone not set \r\n

Below is an example on the console with above pattern & string and another string that works with both re-find and re-matcher (with the same pattern).

user=> (def s1 "[ERROR]: This is an error.")
user=> (def s2 "[WARNING]:   \tTimezone not set  \r\n")
user=> (def rx #"\[(?<level>[A-Z]*)\]:\s(?<msg>.*)")

user=> (re-find rx s1)
["[ERROR]: This is an error." "ERROR" "This is an error."]
user=> (re-find rx s2)
["[WARNING]:   \tTimezone not set  " "WARNING" "  \tTimezone not set  "]

user=> (def m1 (re-matcher rx s1))
user=> (def m2 (re-matcher rx s2))

user=> (.matches m1)
user=> (.matches m2)

As you can see from the code snippet, re-find works on string s2, however re-matcher's matches method returns false for the same string-pattern combination.

I read that re-find uses the Matcher methods behind the scenes (ref), so what am I missing here?


  • user> (def s1 "[ERROR]: This is an error.")
    user> (def s2 "[WARNING]:   \tTimezone not set  \r\n")
    user> (def rx #"\[(?<level>[A-Z]*)\]:\s(?<msg>(?s:.)*)")
    user> (re-find rx s1)
    ["[ERROR]: This is an error." "ERROR" "This is an error."]
    user> (re-find rx s2)
    ["[WARNING]:   \tTimezone not set  \r\n"
     "  \tTimezone not set  \r\n"]
    user> (def m1 (re-matcher rx s1))
    user> (def m2 (re-matcher rx s2))
    user> (.matches m1)
    user> (.matches m2)
    user> (re-groups m2)
    ["[WARNING]:   \tTimezone not set  \r\n"
     "  \tTimezone not set  \r\n"]

    As Wiktor has pointed out, the . in the regex does not match the EOL (end-of-line) markers; and re-find finds any substring match, whereas .matches on re-matcher requires a full string match. So, if you use (?s:....) to enable "single-line mode" you get the regex that does what you intended. See the regex change above.

    But note that this regex includes the EOL chars in the <msg> group. If you do not want to include the EOL chars in the <msg> group, then change the regex to

    user> (def rx1 #"\[(?<level>[A-Z]*)\]:\s(?<msg>.*)(?s:.*)")
    user> (re-find rx1 s1)
    ["[ERROR]: This is an error." "ERROR" "This is an error."]
    user> (re-find rx1 s2)
    ["[WARNING]:   \tTimezone not set  \r\n"
     "  \tTimezone not set  "]
    user> (def m3 (re-matcher rx1 s1))
    user> (def m4 (re-matcher rx1 s2))
    user> (.matches m3)
    user> (.matches m4)
    user> (re-groups m4)
    ["[WARNING]:   \tTimezone not set  \r\n"
     "  \tTimezone not set  "]