Search code examples
emacselisp

In elisp is there a difference between the regex [\\]documentclass and \\documentclass?


I was playing around with the rx function for generating regular expressions from sexps in ELISP but couldn't figure out how to generate the regular expression "\\documentclass" for use in org-export-latex-classes:

    (rx "\\documentclass")
    (rx "\\" "documentclass")
    (rx (char "\\") "documentclass")

Which when evaluated give respectively the following outputs:

    "\\\\documentclass"
    "\\\\documentclass"
    "[\\]documentclass"

Is "\\documentclass" equivalent to "[\\]documentclass"?---I think it is, but am not sure. Can I generate the former using rx?

Edit: Whilst the question was valid I realize my motivation was not; because org-export-latex-classes use strings not regular expressions.


Solution

  • Emacs requires \ to be escaped in the double-quoted read syntax for strings and so, when the code is processed by the lisp reader, "\\" evaluates to a string object containing a single \ character; and hence that single backslash is what the regexp engine sees when it uses that string object.

    However, a \ in a regexp also has an escaping function, which means that the sequence \\ in a regexp matches a single \.

    To represent the sequence \\ in (the read syntax for) Emacs strings, each of those backslashes must itself be escaped by prefixing it with a backslash.

    Therefore "\\\\" evaluates to a string containing \\ which can be used as a regexp matching a single \.

    Within a regexp character alternative sequence, however, backslashes are literal (they do not escape the following character); and so [\], represented by the string "[\\]", matches a single backslash -- the only possible match for that single-character set.

    So used as regexps, the strings "\\\\" and "[\\]" match the same thing.

    The string "\\documentclass" as a regexp is effectively the same as "documentclass" with no backslashes at all, as it is the d which is being escaped in the regexp (which is valid, but of course unnecessary).

    The elisp manual explains this as follows:

    `\' has two functions: it quotes the special characters (including
    `\'), and it introduces additional special constructs.
    
    Because `\' quotes special characters, `\$' is a regular
    expression that matches only `$', and `\[' is a regular expression
    that matches only `[', and so on.
    
    Note that `\' also has special meaning in the read syntax of Lisp
    strings (*note String Type::), and must be quoted with `\'.  For
    example, the regular expression that matches the `\' character is
    `\\'.  To write a Lisp string that contains the characters `\\',
    Lisp syntax requires you to quote each `\' with another `\'.
    Therefore, the read syntax for a regular expression matching `\'
    is `"\\\\"'.
    
    [...]
    
    As a `\' is not special inside a character alternative, it can never
    remove the special meaning of `-' or `]'.  So you should not quote
    these characters when they have no special meaning either.  This would
    not clarify anything, since backslashes can legitimately precede these
    characters where they _have_ special meaning, as in `[^\]' (`"[^\\]"'
    for Lisp string syntax), which matches any single character except a
    backslash.
    

    C-hig (elisp) Regexp Special RET