I want to extract the names and types of function arguments in function pointer types. The use case is to generate wrappers around some C types like:
typedef struct { void (*f)(int x, int y); } vtable_t;
and I want those wrappers to use accurate argument names.
It is easy to get argument types using something like next(top.underlying_typedef_type.get_fields()).type.get_pointee().argument_types()
, which returns a list of two integer types, but without names.
I can see that argument names are available from Cursor
objects, but next(next(node.underlying_typedef_type.get_fields()).type.get_pointee().argument_types()).get_declaration()
seems to return some sort of dummy cursor (<SourceLocation file None, line 0, column 0>
, with no children and .spelling=''
).
The only way I can find to get argument names is by traversing the typedef's full cursor tree; e.g., next(next(top.get_children()).get_children()).spelling
evaluates to 'x'
.
So it seems possible to extract named argument lists by doing full parallel traversals of the cursor and type trees, but this strategy seems complicated and brittle. Is there a simpler way?
To get both the names and types of parameters of a function pointer declaration in the Python Clang API, traverse its declaration structure, collecting the key bits of type information:
The type is a POINTER
to a FUNCTIONPROTO
, and the latter has a
get_result()
to get the return type.
The parameters are PARM_DECL
children, each of which has a type
attribute.
If you traverse into the FUNCTIONPROTO
parameter types using its
argument_types()
method, then that will miss their names, so the
simplest solution is to just ignore argument_types()
and only use the
PARM_DECL
children and their types.
However, one should be aware that there is an inherent fragility to
get_children()
in that the "children" of a node can contain a
haphazard mix of AST nodes depending on the syntax being examined. The
Clang C++ API has children clearly separated by role, but the C API, and
consequently also the Python API, throws them all together in a single
list. You may need to carefully inspect the kind of each child (and
perhaps more) in order to reliably indentify its role within its parent.
Here is an example program that demonstrates getting the return type and parameter names and types from a function pointer declaration:
#!/usr/bin/env python3
"""Print function pointer declaration parameters."""
import sys
from clang.cindex import Config, CursorKind, Index, TypeKind
def cursor_loc(c):
"""Return the location of `c` as a string."""
return f"{c.location.line}:{c.location.column}"
def print_type_details(t, indent_level):
"""Print details about the type `t`."""
ind = " " * indent_level
print(f"{ind}kind: {t.kind}")
if t.kind == TypeKind.POINTER:
print(f"{ind} pointee:")
print_type_details(t.get_pointee(), indent_level + 2)
elif t.kind == TypeKind.FUNCTIONPROTO:
print(f"{ind} return type:")
print_type_details(t.get_result(), indent_level + 2)
print(f"{ind} parameter types:")
for param_type in t.argument_types():
print_type_details(param_type, indent_level + 2)
# A comprehensive program would print details for more kinds of types
# here. The above is just what is needed to demonstrate getting the
# details for a declaration of a pointer to a function.
def print_decl_details(c, already_printed, indent_level):
"""Print details about the declaration `c`. `already_printed` is a
map from location (as a string) to True for those declarations that
have already been printed."""
ind = " " * indent_level
loc = cursor_loc(c)
if c.location:
# Avoid printing the same declaration twice. This technique is very
# crude (there can be multiple distinct declarations at the same
# location) but suffices for use in a demonstration program.
if loc in already_printed:
print(f"{ind}{c.kind} at {loc} (already printed)")
return
already_printed[loc] = True
print(f"{ind}{c.kind} at {loc}")
print(f"{ind} spelling: {c.spelling}")
if c.type:
print(f"{ind} type:")
print_type_details(c.type, indent_level + 2)
for child in c.get_children():
print_decl_details(child, already_printed, indent_level + 1)
def main():
# Load the Clang module. On my Windows system, using Cygwin Python, I
# seem to have to tell it explicitly the name of the DLL (it being on
# the PATH is not enough).
Config.set_library_file("/cygdrive/d/opt/winlibs-mingw64-13.2/bin/libclang.dll");
index = Index.create()
# Parse the C source code.
tu = index.parse("test.c");
# Stop if there were syntax errors.
if len(tu.diagnostics) > 0:
for d in tu.diagnostics:
print(d)
sys.exit(2)
# Parse was successful. Inspect the AST.
print_decl_details(tu.cursor, {}, 0)
main()
# EOF
When run with test.c
containing:
typedef struct {
void (*f)(int x, int y);
} vtable_t;
it prints:
CursorKind.TRANSLATION_UNIT at 0:0
spelling: test.c
type:
kind: TypeKind.INVALID
CursorKind.STRUCT_DECL at 1:9
spelling: vtable_t
type:
kind: TypeKind.RECORD
CursorKind.FIELD_DECL at 2:10
spelling: f
type:
kind: TypeKind.POINTER
pointee:
kind: TypeKind.FUNCTIONPROTO
return type:
kind: TypeKind.VOID <--- return type
parameter types:
kind: TypeKind.INT
kind: TypeKind.INT
CursorKind.PARM_DECL at 2:17
spelling: x <--- param 1 name
type:
kind: TypeKind.INT <--- param 1 type
CursorKind.PARM_DECL at 2:24
spelling: y <--- param 2 name
type:
kind: TypeKind.INT <--- param 2 type
CursorKind.TYPEDEF_DECL at 3:3
spelling: vtable_t
type:
kind: TypeKind.TYPEDEF
CursorKind.STRUCT_DECL at 1:9 (already printed)
The reason the above program uses the already_printed
mechanism is to
avoid printing the structure declaration twice. The fact that it
appears twice in the AST is due to a quirk in how Clang represents the
typedef struct
idiom and then how the C API exposes it (ordinarily,
the Abstract Syntax Tree is, in fact, a tree).