The typical use-case is when a regex needs to include user input. Characters with special meaning in regex (i.e. "the dirty dozen" in Perl) need to be escaped. Perl provides the "quotemeta" functionality to do this: simply encapsulate interpolating variables in \Q
and \E
. But Tcl provides no such functionality (and according to this page, even with ARE).
Is there a good (rigorous) implementation of quotemeta in Tcl out there?
Perl's quotemeta
function simply replaces every non-word character (i.e., characters other than the 26 lowercase letters, the 26 uppercase letters, the 10 digits, and underscore) with a backslash. This is overkill, since not all non-word characters are regexp metacharacters, but it's simple and safe, since escaping a non-word character that doesn't need escaping is harmless.
I believe this implementation is correct:
proc quotemeta {str} {
regsub -all -- {[^a-zA-Z0-9_]} $str {\\&} str
return $str
}
But thanks to glenn's comment, this one is better, at least for modern versions of Tcl (\W
matches any non-word character starting some time after Tcl 8.0.5):
proc quotemeta {str} {
regsub -all -- {\W} $str {\\&} str
return $str
}
(I'm assuming that Tcl's regular expressions are similar enough to Perl's so that this will do the same job in Tcl that it does in Perl.)