Search code examples
clanglibtooling

Why is the source location end off by two characters for a statement ending in a semicolon?


I'm trying to write a source to source translator using libTooling.

I'm using ASTMatchers to try to find if statements that don't have curly braces and then use a rewriter to add the braces.

The matcher I'm using is:

ifStmt(unless(hasDescendant(compoundStmt())))

Then I just get the start and end locations, and rewrite the curly braces.

Here's the source code for that:

if (const IfStmt *IfS = Result.Nodes.getNodeAs<clang::IfStmt>("ifStmt")) {
const Stmt *Then = IfS->getThen();
Rewrite.InsertText(Then->getLocStart(), "{", true, true);
Rewrite.InsertText(Then->getLocEnd(),"}",true,true);

Now the problem is that for some reason the end location is always off by 2 characters. Why is this so?


Solution

  • This is a general issue with the Clang AST: it usually does not record the location of the final semicolon of a statement that ends in one. See discussion Extend Stmt with proper end location? on the LLVM Discourse server.

    To solve this problem, the usual approach is to start with the end location as stored in the AST, then use the Lexer class to advance forward until the semicolon is found. This is not 100% reliable because there can be intervening macros and preprocessing directives, but fortunately that is uncommon for the final semicolon of a statement.

    There is an example of doing this in clang::arcmt::trans::findSemiAfterLocation in the Clang source code. The essence is these lines:

      // Lex from the start of the given location.
      Lexer lexer(SM.getLocForStartOfFile(locInfo.first),
                  Ctx.getLangOpts(),
                  file.begin(), tokenBegin, file.end());
      Token tok;
      lexer.LexFromRawLexer(tok);
      if (tok.isNot(tok::semi)) {
        if (!IsDecl)
          return SourceLocation();
        // Declaration may be followed with other tokens; such as an __attribute,
        // before ending with a semicolon.
        return findSemiAfterLocation(tok.getLocation(), Ctx, /*IsDecl*/true);
      }