Search code examples
libclang

How can I get the argument names of a function type's argument list?


I want to extract the names and types of function arguments in function pointer types. The use case is to generate wrappers around some C types like:

typedef struct { void (*f)(int x, int y); } vtable_t;

and I want those wrappers to use accurate argument names.

It is easy to get argument types using something like next(top.underlying_typedef_type.get_fields()).type.get_pointee().argument_types(), which returns a list of two integer types, but without names.

I can see that argument names are available from Cursor objects, but next(next(node.underlying_typedef_type.get_fields()).type.get_pointee().argument_types()).get_declaration() seems to return some sort of dummy cursor (<SourceLocation file None, line 0, column 0>, with no children and .spelling='').

The only way I can find to get argument names is by traversing the typedef's full cursor tree; e.g., next(next(top.get_children()).get_children()).spelling evaluates to 'x'.

So it seems possible to extract named argument lists by doing full parallel traversals of the cursor and type trees, but this strategy seems complicated and brittle. Is there a simpler way?


Solution

  • To get both the names and types of parameters of a function pointer declaration in the Python Clang API, traverse its declaration structure, collecting the key bits of type information:

    • The type is a POINTER to a FUNCTIONPROTO, and the latter has a get_result() to get the return type.

    • The parameters are PARM_DECL children, each of which has a type attribute.

    If you traverse into the FUNCTIONPROTO parameter types using its argument_types() method, then that will miss their names, so the simplest solution is to just ignore argument_types() and only use the PARM_DECL children and their types.

    However, one should be aware that there is an inherent fragility to get_children() in that the "children" of a node can contain a haphazard mix of AST nodes depending on the syntax being examined. The Clang C++ API has children clearly separated by role, but the C API, and consequently also the Python API, throws them all together in a single list. You may need to carefully inspect the kind of each child (and perhaps more) in order to reliably indentify its role within its parent.

    Example program

    Here is an example program that demonstrates getting the return type and parameter names and types from a function pointer declaration:

    #!/usr/bin/env python3
    """Print function pointer declaration parameters."""
    
    import sys
    
    from clang.cindex import Config, CursorKind, Index, TypeKind
    
    
    def cursor_loc(c):
      """Return the location of `c` as a string."""
    
      return f"{c.location.line}:{c.location.column}"
    
    
    def print_type_details(t, indent_level):
      """Print details about the type `t`."""
    
      ind = "  " * indent_level
    
      print(f"{ind}kind: {t.kind}")
    
      if t.kind == TypeKind.POINTER:
        print(f"{ind}  pointee:")
        print_type_details(t.get_pointee(), indent_level + 2)
    
      elif t.kind == TypeKind.FUNCTIONPROTO:
        print(f"{ind}  return type:")
        print_type_details(t.get_result(), indent_level + 2)
    
        print(f"{ind}  parameter types:")
        for param_type in t.argument_types():
          print_type_details(param_type, indent_level + 2)
    
      # A comprehensive program would print details for more kinds of types
      # here.  The above is just what is needed to demonstrate getting the
      # details for a declaration of a pointer to a function.
    
    
    def print_decl_details(c, already_printed, indent_level):
      """Print details about the declaration `c`.  `already_printed` is a
      map from location (as a string) to True for those declarations that
      have already been printed."""
    
      ind = "  " * indent_level
    
      loc = cursor_loc(c)
    
      if c.location:
        # Avoid printing the same declaration twice.  This technique is very
        # crude (there can be multiple distinct declarations at the same
        # location) but suffices for use in a demonstration program.
        if loc in already_printed:
          print(f"{ind}{c.kind} at {loc} (already printed)")
          return
        already_printed[loc] = True
    
      print(f"{ind}{c.kind} at {loc}")
      print(f"{ind}  spelling: {c.spelling}")
    
      if c.type:
        print(f"{ind}  type:")
        print_type_details(c.type, indent_level + 2)
    
      for child in c.get_children():
        print_decl_details(child, already_printed, indent_level + 1)
    
    
    def main():
      # Load the Clang module.  On my Windows system, using Cygwin Python, I
      # seem to have to tell it explicitly the name of the DLL (it being on
      # the PATH is not enough).
      Config.set_library_file("/cygdrive/d/opt/winlibs-mingw64-13.2/bin/libclang.dll");
      index = Index.create()
    
      # Parse the C source code.
      tu = index.parse("test.c");
    
      # Stop if there were syntax errors.
      if len(tu.diagnostics) > 0:
        for d in tu.diagnostics:
          print(d)
        sys.exit(2)
    
      # Parse was successful.  Inspect the AST.
      print_decl_details(tu.cursor, {}, 0)
    
    
    main()
    
    
    # EOF
    

    When run with test.c containing:

    typedef struct {
      void (*f)(int x, int y);
    } vtable_t;
    

    it prints:

    CursorKind.TRANSLATION_UNIT at 0:0
      spelling: test.c
      type:
        kind: TypeKind.INVALID
      CursorKind.STRUCT_DECL at 1:9
        spelling: vtable_t
        type:
          kind: TypeKind.RECORD
        CursorKind.FIELD_DECL at 2:10
          spelling: f
          type:
            kind: TypeKind.POINTER
              pointee:
                kind: TypeKind.FUNCTIONPROTO
                  return type:
                    kind: TypeKind.VOID         <--- return type
                  parameter types:
                    kind: TypeKind.INT
                    kind: TypeKind.INT
          CursorKind.PARM_DECL at 2:17
            spelling: x                         <--- param 1 name
            type:
              kind: TypeKind.INT                <--- param 1 type
          CursorKind.PARM_DECL at 2:24
            spelling: y                         <--- param 2 name
            type:
              kind: TypeKind.INT                <--- param 2 type
      CursorKind.TYPEDEF_DECL at 3:3
        spelling: vtable_t
        type:
          kind: TypeKind.TYPEDEF
        CursorKind.STRUCT_DECL at 1:9 (already printed)
    

    The reason the above program uses the already_printed mechanism is to avoid printing the structure declaration twice. The fact that it appears twice in the AST is due to a quirk in how Clang represents the typedef struct idiom and then how the C API exposes it (ordinarily, the Abstract Syntax Tree is, in fact, a tree).