Search code examples
regexescapingsublimetext3sublimetext

Character escaping in keymap file


Here is a keymap to auto-pair asterisks (for Markdown and AsciiDoc) files. It works. Questions:

1) Should or should not I escape asterisks in the 6th line of the 1st block?

2) In my tests the escaping in this line doesn't make any difference. But why? If you try to remove escaping in the 5th and 6th lines of the last block (as well as the line in the 5th block), it will lead to incorrect behavior. So, the escaping is required there. But it seems it isn't required in the 1st block. It leads me to confusion.

[
    // Auto-pair *
    { "keys": ["*"], "command": "insert_snippet", "args": {"contents": "*$0*"}, "context":
        [
            { "key": "setting.auto_match_enabled", "operator": "equal", "operand": true },
            { "key": "selection_empty", "operator": "equal", "operand": true, "match_all": true },
            { "key": "following_text", "operator": "regex_contains", "operand": "^(?:\t| |\\)|]|\\}|>|$)", "match_all": true },
            { "key": "preceding_text", "operator": "not_regex_contains", "operand": "[\\*a-zA-Z0-9_]$", "match_all": true }, // --- THIS line ---
            { "key": "eol_selector", "operator": "not_equal", "operand": "string.quoted.other - punctuation.definition.string.end", "match_all": true }
        ]
    },
    { "keys": ["*"], "command": "insert_snippet", "args": {"contents": "*${0:$SELECTION}*"}, "context":
        [
            { "key": "setting.auto_match_enabled", "operator": "equal", "operand": true },
            { "key": "selection_empty", "operator": "equal", "operand": false, "match_all": true }
        ]
    },
    { "keys": ["*"], "command": "move", "args": {"by": "characters", "forward": true}, "context":
        [
            { "key": "setting.auto_match_enabled", "operator": "equal", "operand": true },
            { "key": "selection_empty", "operator": "equal", "operand": true, "match_all": true },
            { "key": "following_text", "operator": "regex_contains", "operand": "^\\*", "match_all": true }, // --- THIS line ---
            { "key": "selector", "operator": "not_equal", "operand": "punctuation.definition.string.begin", "match_all": true },
            { "key": "eol_selector", "operator": "not_equal", "operand": "string.quoted.other - punctuation.definition.string.end", "match_all": true },
        ]
    },
    { "keys": ["backspace"], "command": "run_macro_file", "args": {"file": "res://Packages/Default/Delete Left Right.sublime-macro"}, "context":
        [
            { "key": "setting.auto_match_enabled", "operator": "equal", "operand": true },
            { "key": "selection_empty", "operator": "equal", "operand": true, "match_all": true },
            { "key": "preceding_text", "operator": "regex_contains", "operand": "\\*$", "match_all": true }, // --- THIS line ---
            { "key": "following_text", "operator": "regex_contains", "operand": "^\\*", "match_all": true }, // --- THIS line ---
            { "key": "selector", "operator": "not_equal", "operand": "punctuation.definition.string.begin", "match_all": true },
            { "key": "eol_selector", "operator": "not_equal", "operand": "string.quoted.other - punctuation.definition.string.end", "match_all": true },
        ]
    },
]

Update: Actually, it will be better to use another regex:

- [*a-zA-Z0-9_]$
+ (^[*\\s]*$)|([*a-zA-Z0-9_]$)

This way the asterisks will not be duplicated in the beginning of the lines when you will work with bulleted lists.


Solution

  • In regular expressions, the * operator is special and indicates zero or more repetitions of the previous atom. So in order to match a literal * character you need to escape it as \* (or in JSON \\*) to indicate to the regex engine that you mean a literal * and not to apply the special meaning.

    The construct [] represents a character set, which matches any of the characters within the set. Inside of a character set the only characters that have special meaning to the regex engine are ] (closes the set), \ (still needs to be able to escape),- (specifies a range of characters) and ^ (negates the set and is only special when it's the first character), so only those characters need to be escaped inside of the set.

    Since a character set means match any one of the following characters the special meaning of * does not apply, so inside of a character set it doesn't need to be escaped (although you can still do so if you want to).