Search code examples
parsingoperator-precedencecontext-free-grammartreesitter

In a tree-sitter grammar, is it possible for operator precedence/associativity conflict to cause a runtime parse failure?


Consider an infix operator like subset (⊂). The subset operator is not associative, because its result (a boolean) is not itself a set and so cannot be fed into one or another side of the subset operator. Consider:

S ⊂ T ⊂ M

Ideally this would be a parse failure, but tree-sitter does not seem to allow parse failures based on operator conflict; instead, it requires you unambiguously resolve the conflict at parser generation time by specifying associativity or precedence. Is there any way to indicate to tree-sitter this should be a parse conflict? Not only for non-associative operators of the same kind, but also between different operators with equivalent precedence which are not associative, like:

S ⊂ T ⊆ M

Or is the only solution to specify an unambiguous parse, then handle this at the semantic level?


Solution

  • You are correct this should be handled at the semantic level. So ⊂ should be marked left-associative in the grammar for parsing purposes, even though it is not. For the string S ⊂ T ⊂ M it will then be parsed as:

    (op ⊂
      (op ⊂
        (id S)
        (id T)
      )
      (id M)
    )
    

    At the semantic level you can then add a rule checking whether has any child nodes which are also (or any other operator of equal precedence), which you can surface as an associativity/precedence conflict error.