Search code examples
vscode-extensionstmlanguage

textmate language multiline pattern for trailing commas


Going completely insane trying to write a textmate pattern to catch trailing commas. Commas are trailing if they are followed by any amount of whitespace or newlines and then a closing parenthesis ).

ex. 1

(a, b, c,) <- the last comma here is trailing

ex. 2

(a, b
, c
,) <- the last comma here is trailing

My understanding is that patterns in textmate are not applied across multiple lines unless you use "begin" and "end." In this case, the inner patterns you can specify are applied to the text between "begin" and "end."

My current best attempt is:

"begin": "\\(",
"end": "\\)",
"patterns": [
    {
        "name": "invalid.trailing.comma.json",
        "match": ",\\s*\\)"
    }
]

This handles ex. 1, but fails on ex. 2 even though my understanding is that the new block you create by specifying a begin and end should be treated as a contiguous chunk of text.

I am hoping somebody can help me correct whatever basic misunderstandings I clearly have. The documentation on textmate is not good. I'm not even sure what I'm trying to do is possible.


Solution

  • I'm not even sure what I'm trying to do is possible.

    you would be correct. it is impossible to fully detect all trailing commas.

    your first and 2nd examples are easy.
    but just having the comma by itself on an empty line is not so easy.

    (
    a, // this is valid
    b
    , // this is also valid
    c
    , // this is invalid (and impossible to detect correctly)
    )
    
    "name": "invalid.trailing.comma.json",
    "match": ",\\s*\\)"
    

    This would work for the easy case of the trailing comma , being on the same line as the ending bracket ).
    However because you are capturing the bracket, it will push the end pattern along to the next bracket ).
    I would recommend using a lookahead instead; so to not capture the bracket.
    ,(?=\\s*\\))

    what you can do is instead of invalidating the comma, is you can invalidate the bracket.

    "begin": "\\(",
    "end": "\\)",
    "beginCaptures": { "0": { "name": "bracket.begin.json" } },
    "endCaptures": { "0": { "name": "bracket.end.json" } },
    "name": "brackets.json",
    "patterns": [
        {
            "begin": ",",
            "end": "(?=(\\))|\\S)",
            "beginCaptures": { "0": { "name": "comma.json" } },
            "endCaptures": { "1": { "name": "invalid.expected-json-item.json" } }
        }
    ]
    

    the begin starts at the comma and the end stops at the bracket OR any non-whitespace character.
    having a capture group inside the lookahead allows us to still apply a scopeName to the bracket, without actually capturing the text.
    However VSCode TextMate has a bug where it then wont apply the bracket.end.json scopeName to it, but that doesn't really matter much

    My understanding is that patterns in textmate are not applied across multiple lines unless you use "begin" and "end"

    That is correct.
    The regex's are still not multi-line tho.
    The items in the patterns array are checked before the end pattern, so don't get caught out on that.
    https://github.com/RedCMD/TmLanguage-Syntax-Highlighter/blob/main/documentation/rules.md#begin