How can I get the argument names of a function type's argument list?

I want to extract the names and types of function arguments in function pointer types. The use case is to generate wrappers around some C types like:

typedef struct { void (*f)(int x, int y); } vtable_t;

and I want those wrappers to use accurate argument names.

It is easy to get argument types using something like next(top.underlying_typedef_type.get_fields()).type.get_pointee().argument_types(), which returns a list of two integer types, but without names.

I can see that argument names are available from Cursor objects, but next(next(node.underlying_typedef_type.get_fields()).type.get_pointee().argument_types()).get_declaration() seems to return some sort of dummy cursor (<SourceLocation file None, line 0, column 0>, with no children and .spelling='').

The only way I can find to get argument names is by traversing the typedef's full cursor tree; e.g., next(next(top.get_children()).get_children()).spelling evaluates to 'x'.

So it seems possible to extract named argument lists by doing full parallel traversals of the cursor and type trees, but this strategy seems complicated and brittle. Is there a simpler way?

Solution

To get both the names and types of parameters of a function pointer declaration in the Python Clang API, traverse its declaration structure, collecting the key bits of type information:

The type is a POINTER to a FUNCTIONPROTO, and the latter has a get_result() to get the return type.
The parameters are PARM_DECL children, each of which has a type attribute.

If you traverse into the FUNCTIONPROTO parameter types using its argument_types() method, then that will miss their names, so the simplest solution is to just ignore argument_types() and only use the PARM_DECL children and their types.

However, one should be aware that there is an inherent fragility to get_children() in that the "children" of a node can contain a haphazard mix of AST nodes depending on the syntax being examined. The Clang C++ API has children clearly separated by role, but the C API, and consequently also the Python API, throws them all together in a single list. You may need to carefully inspect the kind of each child (and perhaps more) in order to reliably indentify its role within its parent.

Example program

Here is an example program that demonstrates getting the return type and parameter names and types from a function pointer declaration:

#!/usr/bin/env python3
"""Print function pointer declaration parameters."""

import sys

from clang.cindex import Config, CursorKind, Index, TypeKind


def cursor_loc(c):
  """Return the location of `c` as a string."""

  return f"{c.location.line}:{c.location.column}"


def print_type_details(t, indent_level):
  """Print details about the type `t`."""

  ind = "  " * indent_level

  print(f"{ind}kind: {t.kind}")

  if t.kind == TypeKind.POINTER:
    print(f"{ind}  pointee:")
    print_type_details(t.get_pointee(), indent_level + 2)

  elif t.kind == TypeKind.FUNCTIONPROTO:
    print(f"{ind}  return type:")
    print_type_details(t.get_result(), indent_level + 2)

    print(f"{ind}  parameter types:")
    for param_type in t.argument_types():
      print_type_details(param_type, indent_level + 2)

  # A comprehensive program would print details for more kinds of types
  # here.  The above is just what is needed to demonstrate getting the
  # details for a declaration of a pointer to a function.


def print_decl_details(c, already_printed, indent_level):
  """Print details about the declaration `c`.  `already_printed` is a
  map from location (as a string) to True for those declarations that
  have already been printed."""

  ind = "  " * indent_level

  loc = cursor_loc(c)

  if c.location:
    # Avoid printing the same declaration twice.  This technique is very
    # crude (there can be multiple distinct declarations at the same
    # location) but suffices for use in a demonstration program.
    if loc in already_printed:
      print(f"{ind}{c.kind} at {loc} (already printed)")
      return
    already_printed[loc] = True

  print(f"{ind}{c.kind} at {loc}")
  print(f"{ind}  spelling: {c.spelling}")

  if c.type:
    print(f"{ind}  type:")
    print_type_details(c.type, indent_level + 2)

  for child in c.get_children():
    print_decl_details(child, already_printed, indent_level + 1)


def main():
  # Load the Clang module.  On my Windows system, using Cygwin Python, I
  # seem to have to tell it explicitly the name of the DLL (it being on
  # the PATH is not enough).
  Config.set_library_file("/cygdrive/d/opt/winlibs-mingw64-13.2/bin/libclang.dll");
  index = Index.create()

  # Parse the C source code.
  tu = index.parse("test.c");

  # Stop if there were syntax errors.
  if len(tu.diagnostics) > 0:
    for d in tu.diagnostics:
      print(d)
    sys.exit(2)

  # Parse was successful.  Inspect the AST.
  print_decl_details(tu.cursor, {}, 0)


main()


# EOF

When run with test.c containing:

typedef struct {
  void (*f)(int x, int y);
} vtable_t;

it prints:

CursorKind.TRANSLATION_UNIT at 0:0
  spelling: test.c
  type:
    kind: TypeKind.INVALID
  CursorKind.STRUCT_DECL at 1:9
    spelling: vtable_t
    type:
      kind: TypeKind.RECORD
    CursorKind.FIELD_DECL at 2:10
      spelling: f
      type:
        kind: TypeKind.POINTER
          pointee:
            kind: TypeKind.FUNCTIONPROTO
              return type:
                kind: TypeKind.VOID         <--- return type
              parameter types:
                kind: TypeKind.INT
                kind: TypeKind.INT
      CursorKind.PARM_DECL at 2:17
        spelling: x                         <--- param 1 name
        type:
          kind: TypeKind.INT                <--- param 1 type
      CursorKind.PARM_DECL at 2:24
        spelling: y                         <--- param 2 name
        type:
          kind: TypeKind.INT                <--- param 2 type
  CursorKind.TYPEDEF_DECL at 3:3
    spelling: vtable_t
    type:
      kind: TypeKind.TYPEDEF
    CursorKind.STRUCT_DECL at 1:9 (already printed)

The reason the above program uses the already_printed mechanism is to avoid printing the structure declaration twice. The fact that it appears twice in the AST is due to a quirk in how Clang represents the typedef struct idiom and then how the C API exposes it (ordinarily, the Abstract Syntax Tree is, in fact, a tree).