Suppose that I have the following code:
struct S {
int abcd = 0;
};
int main() {
S s;
return s.abcd;
}
The corresponding AST part:
-FunctionDecl 0x563ddd3a3e20 <line:5:1, line:8:1> line:5:5 main 'int ()'
`-CompoundStmt 0x563ddd3a4570 <col:12, line:8:1>
|-DeclStmt 0x563ddd3a44e0 <line:6:1, col:4>
| `-VarDecl 0x563ddd3a3f40 <col:1, col:3> col:3 used s 'S' callinit
| `-CXXConstructExpr 0x563ddd3a44b8 <col:3> 'S' 'void () noexcept'
`-ReturnStmt 0x563ddd3a4560 <line:7:1, col:10>
`-ImplicitCastExpr 0x563ddd3a4548 <col:8, col:10> 'int' <LValueToRValue>
`-MemberExpr 0x563ddd3a4518 <col:8, col:10> 'int' lvalue .abcd 0x563ddd3a3d10
`-DeclRefExpr 0x563ddd3a44f8 <col:8> 'S' lvalue Var 0x563ddd3a3f40 's' 'S'
The problem is: according to AST, the return statement spans 10 columns, while in reality it spans 13.
BUT if we put braces around member access, then it gives the expected size:
int main() {
S s;
return (s.abcd);
}
`-FunctionDecl 0x562842792e20 <line:5:1, line:8:1> line:5:5 main 'int ()'
`-CompoundStmt 0x562842793590 <col:12, line:8:1>
|-DeclStmt 0x5628427934e0 <line:6:1, col:4>
| `-VarDecl 0x562842792f40 <col:1, col:3> col:3 used s 'S' callinit
| `-CXXConstructExpr 0x5628427934b8 <col:3> 'S' 'void () noexcept'
`-ReturnStmt 0x562842793580 <line:7:1, col:15>
`-ImplicitCastExpr 0x562842793568 <col:8, col:15> 'int' <LValueToRValue>
`-ParenExpr 0x562842793548 <col:8, col:15> 'int' lvalue
`-MemberExpr 0x562842793518 <col:9, col:11> 'int' lvalue .abcd 0x562842792d10
`-DeclRefExpr 0x5628427934f8 <col:9> 'S' lvalue Var 0x562842792f40 's' 'S'
Also, you can see that ParenExpr
spans 8-15 columns, while MemberExpr
spans 9-11 colums, which exposes strangeness of the later AST node.
Am I missing something?
I'm doing source-to-source transformations, and I would like to get the correct size of expressions/statements. Right now I have no idea how to do it. getEndLoc()
for the original return statement also returns the location of .
(dot operator).
clang version 9.0.1
I decided to ask for help on cfe-dev mailing list, here's what I've learned.
It seems like I completely misunderstood the meaning of getBeginLoc()
/getEndLoc()
. getBeginLoc()
returns the location of the beginning of the first token, while getEndLoc()
returns the beggining of the last token.
To get an end of a token, one might use Lexer::getLocForEndOfToken(...)
.
Documentation might also be helpful: https://clang.llvm.org/docs/InternalsManual.html#sourcerange-and-charsourcerange