Search code examples
regexclojure

Clojure multiline regular expression


I'm trying to test a string for a basic html pattern and although I use the m (multiline) modifier it only works when the string is a 1-liner

(re-find #"(?im)^<html>.*<body>.*</body>.*</html>" c))

Fails:

"<html>   <body>   sad   </body> 
     </html>"

Works:

"<html>   <body>   sad   </body>      </html>"

What am I doing wrong?


Solution

  • Disclaimer: I'm not a Clojure programmer, but I think this problem is independent of the language.

    When multi-line mode is enabled, the interpretation of the caret ^ and the dollar $ changes like this: Instead of matching the beginning and end of the entire input string, they match the beginning and the end of each line in the input string. This is - as far as I can see - not what you want/need.

    What you want is for your .*s to match newlines (what they don't do by default) and this can be done by enabling the single-line mode (aka dot-all mode). So this means:

    (re-find #"(?is)^<html>.*<body>.*</body>.*</html>" c))
    

    You can also verify this on RegExr.