libclang defines only 5 types of tokens:
Is it possible to get a more detailed information about tokens? For example, for the following source code:
struct Type;
void foo(Type param);
I would expect the output to be like:
I also need to map those entities to file locations.
First, you probably need a bit of background on how parsing works. A textbook on compilers would be a useful resource. First, the file is converted into a series of tokens; that gives you identifiers, punctuation, etc. The code that does this is called a lexer. Then, the parser runs; this converts a list of tokens into an AST (structured declarations/expressions/etc.).
clang does keep track of the various parts of declarations and expressions, but not in the way you're describing. For a given function declaration, it keeps track of things like the location of the name of the function and the start of the parameter list, but it keeps those in terms of locations in the file, not tokens.
A CXToken
is just a token; there isn't any additional associated semantic information beyond the five types you listed. (You can get the actual text of the token with clang_getTokenSpelling
, and the location with clang_getTokenExtent
.) clang_annotateTokens
gives you CXCursor
s, which let you examine the relevant declarations.
Note that some details aren't exposed by the libclang API; if you need more detail, you might need to use clang's C++ API instead.