Search code examples
regexcommon-lispmodifiercl-ppcre

How to change MULTI-LINE-MODE and SINGLE-LINE-MODE?


It seems, that the both similar questions do not contain the information I seek.

I don't get how the mode change works in CL-PPCRE. I tried it with both, embedded modifiers as with the keyword arguments. Can you explain the behaviour to me?

EDIT: The question is answered. The major problem trivially was a wrong example string. The source code below is the original request (with 2 EDITS):

;; From: https://perldoc.perl.org/perlretut
;;    "Here are the four possible combinations:
;;     $x = "There once was a girl\nWho programmed in Perl\n";
;;     $x =~ /^Who/;   # doesn't match, "Who" not at start of string
;;     $x =~ /^Who/s;  # doesn't match, "Who" not at start of string
;;     $x =~ /^Who/m;  # matches, "Who" at start of second line
;;     $x =~ /^Who/sm; # matches, "Who" at start of second line
;;     $x =~ /girl.Who/;   # doesn't match, "." doesn't match "\n"
;;     $x =~ /girl.Who/s;  # matches, "." matches "\n"
;;     $x =~ /girl.Who/m;  # doesn't match, "." doesn't match "\n"
;;     $x =~ /girl.Who/sm; # matches, "." matches "\n"

;; <<<<<<<<<EDIT: This was the main problem here:>>>>>>>>>>>
(defparameter *x* "There once was a lady\nWho programmed in Lisp.\n")

;; I was pointed out on #commonlisp that I had simply copied 
;; the Perl string: 
;; "There once was a girl\nWho programmed in Perl.\n".
;; I was so focussed on the regexp that I treated the TARGET string
;; the same. (So my first revision attempt actually was:
;; "Oh yes. You're right: 
;; 'There once was a lady\\nWho programmend in Lisp.\\n'." 
;; Goofy, me.)
;; So it should have been, of course:
;; -----------------------------------------------------
;; (defparameter *x* (str:concat "There once was a lady"
;;                               (string #\Newline)
;;                               "Who programmed in Lisp."
;;                               (string #\Newline)))
;; ------------------------------------------------------

;; <<<<EDIT: And this was the second (misplaced modifiers)>>>>>
;;     Once, a condition was signaled when I put one in the front. But
;;     apparently I was crossed-eyed in that moment ...
(ppcre:scan "^Who" *x*)         ; => NIL
(ppcre:scan "^(?s)Who" *x*)     ; => NIL
(ppcre:scan "^(?m)Who" *x*)     ; => NIL -> unfortunately ... differs from tutorial
(ppcre:scan "^(?sm)Who" *x*)    ; => NIL -> as well
(ppcre:scan "lady.Who" *x*)     ; => 17, 25, #(), #() -> as well
(ppcre:scan "(?s)lady.Who" *x*) ; => 17, 25, #(), #() -> as well
(ppcre:scan "(?m)lady.Who" *x*) ; => 17, 25, #(), #() -> as well
(ppcre:scan "^(?sm)Who" *x*)    ; => NIL -> as well

;; Maybe I embedded the modifiers at the wrong place?
(let ((s (ppcre:create-scanner "^Who")))
  (ppcre:scan s *x*)) ; => NIL
(let ((s (ppcre:create-scanner "^Who" :single-line-mode t)))
  (ppcre:scan s *x*)) ; => NIL
(let ((s (ppcre:create-scanner "^Who" :single-line-mode t)))
  (ppcre:scan s *x*)) ; => NIL -> nnnnope
(let ((s (ppcre:create-scanner "lady.Who" :single-line-mode t
                                          :multi-line-mode t)))
  (ppcre:scan s *x*)) ; => 17, 25, #(), #() -> At least consistent
(let ((s (ppcre:create-scanner "lady.Who" :single-line-mode t)))
  (ppcre:scan s *x*)) ; => 17, 25, #(), #() -> as well
(let ((s (ppcre:create-scanner "lady.Who" :multi-line-mode t)))
  (ppcre:scan s *x*)) ; => 17, 25, #(), #() -> as well
(let ((s (ppcre:create-scanner "^Who" :single-line-mode t
                                      :multi-line-mode t)))
  (ppcre:scan s *x*)) ; => NIL -> as well

The entry in my problem of understanding was this:

;; So (three let forms just for three explicit outputs):
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil)))
  (ppcre:scan s "\\n"))                              ; => NIL -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil)))
  (ppcre:scan s "a"))                     ; => 0, 1, #(), #() -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil)))
  (ppcre:scan s "a\\n"))                             ; => NIL -> like above

;; This seems to be the default setting. Now, let's try the opposite:

(let ((s (ppcre:create-scanner "^.$" :multi-line-mode t)))
  (ppcre:scan s "\\n"))                              ; => NIL -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode t)))
  (ppcre:scan s "a"))                     ; => 0, 1, #(), #() -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode t)))
  (ppcre:scan s "a\\n"))                             ; => NIL -> like above

;; Oops.

;; The documentation:
;;   "* Consider using 'single-line mode' if it makes sense for your task.
;;      By default (following Perl's practice), a dot means to search for
;;      any character except line breaks. In single-line mode a dot searches
;;      for any character which in some cases means that large parts of
;;      the target can actually be skipped. This can be vastly more
;;      efficient for large targets."

;; So, by default :MULTI-LINE-MODE is T. But why there is no effect?

(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil :single-line-mode t)))
  (ppcre:scan s "\\n"))                              ; => NIL -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil :single-line-mode t)))
  (ppcre:scan s "a"))                     ; => 0, 1, #(), #() -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil :single-line-mode t)))
  (ppcre:scan s "a\\n"))                             ; => NIL -> like above

I thank you very much for your hints.


Solution

  • You have some problems with your regular expressions: "^(?m)Who" should be "(?m)^Who", for example. With that change,

    CL-USER> (ppcre:scan "(?m)^Who" *x*)
    22
    25
    #()
    #()
    

    The scanner has to be in multi-line mode before seeing the ^ so it knows to match at the start of a line, not just at the start of the string.

    Some others:

    • "lady.Who" should fail to match as . doesn't match newline. Indeed:
    CL-USER> (ppcre:scan "lady.Who" *x*) 
    NIL
    

    but you indicate in a comment in your code that it is matching. Are you sure *x* is what you think it is?

    • The next one,
    CL-USER> (ppcre:scan "(?s)lady.Who" *x*)
    17
    25
    #()
    #()
    

    you also say those results are unfortunate, but they're exactly what I would expect. Going into single-line mode makes . match a newline, after all.

    • and then
    CL-USER> (ppcre:scan "(?m)lady.Who" *x*)
    NIL
    

    multi-line mode doesn't change what . matches so a failure here is expected; but again your comments suggest it matches for you?

    • Finally, "^(?sm)Who" again has the ^ too early to be affected by the mode changes:
    CL-USER> (ppcre:scan "^(?sm)Who" *x*) 
    NIL
    CL-USER> (ppcre:scan "(?sm)^Who" *x*)
    22
    25
    #()
    #()