Search code examples
c++treesitter

Tree sitter query: optionally step into a node


I want to extract all function names from some C++ source code via a tree_sitter query. (link to playground

Here is my query:

(function_definition
    declarator: [
        (reference_declarator (function_declarator declarator: (_) @name))
        (function_declarator declarator: (_) @name)
    ]
 ) @func

And here is the source code that I'm testing it on:

void sayHello() {
    cout << "Hello, World!" << endl;
}

blah::mlah cram::greet(const string& name) {
    cout << "Hello, " << name << "!" << endl;
}

blah::mlah& greet(const string& name) {
    cout << "Hello, " << name << "!" << endl;
}

blah::mlah& cram::greet(const string& name) {
    cout << "Hello, " << name << "!" << endl;
}
  1. It seems like a bug to me that I must go through the reference_declarator node to get to the function_declarator node when the function returns a reference to an object.

  2. Imperatively, I could optionally iterate into the reference_declarator. But queries do not traverse, they match.

I read through the tree sitter querying spec, and I couldn't figure out a way to shorten my query. I want to say "Given a function declaration, find the name of the declarator", but I do not know how to express it with the existing grammar.


Solution

  • A treesitter query is essentially a path to a node. You can't skip directories when specifying a filename path and you can't skip a node-type when specifying a treesitter query.

    So if the target node can appear in two different contexts, you need two different paths. I've come across this recently with decorated and undecorated functions in Python. You need two distinct queries to get both.

    Interestingly though, these paths don't need to start at the root or anywhere in particular. So if the context of function_definition > reference_declarator is not required to disambiguate, you could just use:

    (function_declarator declarator: (_) @name)
    

    To get all the function names.