Search code examples
c++clangabstract-syntax-treelibtooling

using clang's libTooling to rewrite nested ternary expressions


The following 'C' source code which contains a nested ternary statement fails to be rewritten correctly using libTooling's clang::Rewriter using my RecursiveASTVisitor.

I cannot figure out why this is happening but I suspect it has somethhing to do with overlapping writes where the outer ReWriter:ReplaceText does not consider the effect of the nested ReplaceText.

Given that the statements are nested, I would like to start rewriting the ternaries source from the innermost AST nodes back up through parent nodes in a recursive manner. As parent clang::ConditionalOperator nodes are encountered, each would be rewritten - preserving the rewrites for the child nodes already visited).

The orignal 'C' code below highlights the recursive ConditionalOperators:

void nestedTernaryDeclStmt() {
    int ii, jj, kk, ll, mm;
    int foo =  ii > jj  ? ( kk <= ll ) ? mm : 4123 : 5321 ;
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ (outer ternary)
                          ^~~~~~~~~~~~~~~~~~~~~~~^        (nested ternary)
}

I added diagnostic code to examine the edit buffer and locations after each callback. Unfortunately, rewriting the outermost ConditionalOperator discards the previous replaced replaced text from the child 'ConditionalOperator's.

At the end of the first bool VisitConditionalOperator(clang::ConditionalOperator *CO) const callback, the contents of the file's edit buffer (see the str variable) is correctly rewritten (ignoring the function comments) as:

// Rewrite buffer after nested rewrite
// this replaced 
// '( kk <= ll ) ? mm : 4123' 
// with 
// '(/*COND0*/( kk <= ll )) ? (/*LHS0*/mm) : (/*RHS0*/4123)'
void nestedTernaryDeclStmt() {
    int ii, jj, kk, ll, mm;
    int foo =  ii > jj  ? (/*COND0*/( kk <= ll )) ? (/*LHS0*/mm) : (/*RHS0*/4123) : 5321 ;
}

This is exactly what I expected, as the innermost ternary ( kk <= ll ) ? mm : 4123 is completely rewritten and placed where expected in the rewrite buffer. I did not have to use an adjusted end location for this to work (normally, sourceRanges go to the start of the last token) as the rewriter adjusts the rewrite length to accomodate the extra length to the end of the last token (AFAIK).

Note, in order to start rewriting from the innermost ternary expression, I needed to return true from an overwritten bool shouldTraversePostOrder() const. Without this, clang::Rewriter starts rewriting from the outermost ternary working its way down to the child nodes (not what we want for handling nesting and besides when I tried it the effect was even worse).

When the bool VisitConditionalOperator(clang::ConditionalOperator *CO) const is called for the second time, this time on the outer ternary expression, dumping the rewritten rewrite buffer has the following corrupted content (again, ignore the function comments and also the highlighted range ^~~ ... ~~^).

// Rewrite buffer after outer (final) rewrite
// (the range below should have been replaced with the above replaced text
void nestedTernaryDeclStmt() {
    int ii, jj, kk, ll, mm;
    int foo =  (/*COND1*/ii > jj) ? (/*LHS1*/( kk <= ll ) ? mm : 4123) : (/*RHS1*/5321) ;
                                             ^~~~~~~~~~~~~~~~~~~~~~~^ 
}

The highlighted range above should have been modified by the first rewrite but it was not. I was expecting the final edit buffer to be the following (ignore function comment):

// This is what I expected as the final rewritten buffer.
void nestedTernaryDeclStmt() {
    int ii, jj, kk, ll, mm;
    int foo =  (/*COND1*/ii > jj) ? (/*LHS1*/(/*COND0*/( kk <= ll )) ? (/*LHS0*/mm) : (/*RHS0*/4123)) : (/*RHS1*/5321) ;
}

I tried many other combinations of type of token and non token character ranges but nothing I try seems to work.

Here is the problematic visitor with a helper lambda to convert clang::Stmt to std::string.

// Experimental recursive visitor class.
class MyVisitor : public clang::RecursiveASTVisitor<MyVisitor> {
public:
    explicit MyVisitor(
        clang::ASTContext& rContext,
        clang::Rewriter& rRewriter)
        : mContext{rContext}
        , mRewriter{rRewriter}
    {}

    // default behavior is to traverse the AST in pre-order (override to true to force post-order).
    // @JC note that since this uses CRTP pattern (i.e. class Derived : public Base<Derived>),
    // the method is not virtual & bypasses the need for a VTable - very clever!
    bool shouldTraversePostOrder() const {
        return true;
    }

    //! Visitor pattern callback for 'ConditionalOperator'.
    bool VisitConditionalOperator(clang::ConditionalOperator *CO) const {
        // This method is called for every 'ConditionalOperator' in the code.
        // You can examine 'CO' to extract information about it.
        const auto& SM = mContext.getSourceManager();
        const auto& LO = mContext.getLangOpts();
        const auto sourceRange = CO->getSourceRange();

        // Assume SM is a clang::SourceManager object and Range is a clang::SourceRange for the range
        const auto BLoc = sourceRange.getBegin();
        const auto ELoc = sourceRange.getEnd();

        // Adjust the end location to the end of the last token
        const auto AdjustedELoc = clang::Lexer::getLocForEndOfToken(
            ELoc, 0, SM, LO);

        // Create adjusted range that includes the length of the last token
        clang::SourceRange AdjustedRange(BLoc, AdjustedELoc);

        auto CSR1 = clang::CharSourceRange::getCharRange(BLoc, AdjustedELoc);
        //CSR1.setTokenRange(true);

        unsigned BLine = SM.getSpellingLineNumber(BLoc);
        unsigned BCol = SM.getSpellingColumnNumber(BLoc);
        unsigned ELine = SM.getSpellingLineNumber(ELoc);
        unsigned ECol = SM.getSpellingColumnNumber(ELoc);
        unsigned adjustedELine = SM.getSpellingLineNumber(AdjustedELoc);
        unsigned adjustedECol = SM.getSpellingColumnNumber(AdjustedELoc);


        auto cond = gStmtToString(&mContext, CO->getCond());
        auto lhs = gStmtToString(&mContext, CO->getLHS());
        auto rhs = gStmtToString(&mContext, CO->getRHS());

        // Instrument as follows:
        // add comments to each part of the ConditionalOperator.
        const auto probeText = std::format(
            "(/*COND{0}*/{1}) ? (/*LHS{0}*/{2}) : (/*RHS{0}*/{3})"
            , gProbeIndex++
            , cond
            , lhs
            , rhs);
        mRewriter.ReplaceText(/*AdjustedRange*/sourceRange/*CSR1*/, probeText);

        // Get the RewriteBuffer for the main file.
        std::string str;
        llvm::raw_string_ostream rso(str);
        clang::RewriteBuffer &RB = mRewriter.getEditBuffer(SM.getMainFileID());
        RB.write(rso);
        rso.flush();

        // returning false aborts the traversal
        return true;
    }

private:
    clang::ASTContext& mContext;
    clang::Rewriter& mRewriter;
};

Here is the helper lambda to convert clang::Stmt (and therfore clang::Expn) to std::string.

// SYSTEM INCLUDES
#include <string>
#pragma warning(push)
#pragma warning(disable : 4146 4267 4291 4100 4244 4624)
#include <clang/Tooling/Tooling.h>
#include <clang/AST/ASTConsumer.h>
#include <clang/Tooling/CommonOptionsParser.h>
#include <clang/AST/RecursiveASTVisitor.h>
#include <clang/ASTMatchers/ASTMatchers.h>
#include <clang/ASTMatchers/ASTMatchFinder.h>
#include <clang/Frontend/CompilerInstance.h>
#include <clang/Rewrite/Core/Rewriter.h>
#pragma warning(pop)

// STATIC VARIABLE INITIALIZATIONS
namespace {
    //! The CProbe tool category.
    cl::OptionCategory gToolCategory("CProbe Tool Category");

    //! The probe index
    int gProbeIndex = 0;

    // Convert clang::Stmt or clang::Expr (subclass of Stmt) to std::string.
    auto gStmtToString = [](
        const clang::ASTContext* context,
        const clang::Stmt* stmt) -> std::string {
            const auto& SM = context->getSourceManager();
            const auto& LO = context->getLangOpts();
            const auto startLoc = stmt->getBeginLoc();
            const auto endLoc = clang::Lexer::getLocForEndOfToken(
                stmt->getEndLoc(), 0, SM, LO);
            if (SM.isWrittenInSameFile(startLoc, endLoc)) {
                const auto charRange = clang::CharSourceRange::getCharRange(startLoc, endLoc);
                return clang::Lexer::getSourceText(charRange, SM, LO).str();
            }
            return {};
        };
}

The remaining code to get this working as a stand alone libTooling utility is as follows:

// ASTConsumer implementation reads AST produced by the Clang parser.
class MyASTConsumer : public clang::ASTConsumer {
public:
    MyASTConsumer(clang::ASTContext& ctx, clang::Rewriter& r)
        : mVisitor(ctx, r) {
        // Add handler for translation unit.
        mMatcher.addMatcher(traverse(clang::TK_IgnoreUnlessSpelledInSource,
            translationUnitDecl().
            bind("translationUnitDecl")), &mTranslationUnitHandler);
    }

    //! Insert the macro definitions at the beginning of the translation unit.
    void HandleTranslationUnit(clang::ASTContext& context) override {
        // first expand all the ConditionalOperator nodes
        mVisitor.TraverseDecl(context.getTranslationUnitDecl());
    }

private:
    TranslationUnitHandler mTranslationUnitHandler;
    MyVisitor mVisitor;
};

// For each source file provided to the tool, a new FrontendAction is created.
class CProbeFrontEndAction : public clang::ASTFrontendAction {
public:
    //! explicit Constructor - pass the output path to write the instrumented source.
    explicit CProbeFrontEndAction(fs::path rOutputPath)
        : mOutputPath(std::move(rOutputPath))
    {}

    //! This function is called after parsing the source file.
    void EndSourceFileAction() override {
        const auto& SM = mRewriter.getSourceMgr();
        std::error_code error_code;
        raw_fd_ostream outFile(
            mOutputPath.generic_string(),
            error_code, sys::fs::OF_None);
        // write the result to outFile
        mRewriter.getEditBuffer(SM.getMainFileID()).write(outFile);
        outFile.close();
    }

    std::unique_ptr<clang::ASTConsumer> CreateASTConsumer(clang::CompilerInstance& CI, StringRef file) override {
        mRewriter.setSourceMgr(CI.getSourceManager(), CI.getLangOpts());
        return std::make_unique<MyASTConsumer>(CI.getASTContext(), mRewriter);
    }
private:
    fs::path mOutputPath;
    clang::Rewriter mRewriter;
};

//! Factory function for creating a new FrontendAction that takes a user parameter.
std::unique_ptr<FrontendActionFactory> myNewFrontendActionFactory(const fs::path& rOutputPath) {
    class SimpleFrontendActionFactory : public FrontendActionFactory {
    public:
        explicit SimpleFrontendActionFactory(fs::path aOutputPath)
            : mOutputPath(std::move(aOutputPath))
        {}

        std::unique_ptr<clang::FrontendAction> create() override {
            return std::make_unique<CProbeFrontEndAction>(mOutputPath);
        }
    private:
        fs::path mOutputPath;
    };
    return std::make_unique<SimpleFrontendActionFactory>(rOutputPath);
}

// Instrument the code - rewriting it to rOutputPath.
void instrumentSource(const fs::path& rInputPath, const fs::path& rOutputPath) {
    std::string program = "foo.exe";
    std::string inputPath = rInputPath.generic_string();
    std::string outputPath = rOutputPath.generic_string();
    const char* argv[] = { program.data(), inputPath.data(), "--", nullptr };
    int argc = 3;
    auto expectedParser = CommonOptionsParser::create(
        argc, argv, gToolCategory);
    if (expectedParser) {
        CommonOptionsParser& optionsParser = expectedParser.get();
        ClangTool tool(optionsParser.getCompilations(), optionsParser.getSourcePathList());
        tool.run(myNewFrontendActionFactory(rOutputPath.generic_string()).get());
    }
}

For additional background, here is a simplified AST tree as reported from clang-query (with the set Traversal IgnoreUnlessSpelledInSource option enabled)

clang-query> m conditionalOperator().bind("ternary")

Match #1:

Binding for "ternary":
ConditionalOperator 0x558606d904e0 </mnt/c/temp/trivial/src/NanoFile.c:3:16, col:54> 'int'
|-BinaryOperator 0x558606d90368 <col:16, col:21> 'int' '>'
| |-DeclRefExpr 0x558606d902f8 <col:16> 'int' lvalue Var 0x558606d8ffc8 'ii' 'int'
| `-DeclRefExpr 0x558606d90318 <col:21> 'int' lvalue Var 0x558606d90048 'jj' 'int'
|-ConditionalOperator 0x558606d90490 <col:27, col:47> 'int'
| |-BinaryOperator 0x558606d903f8 <col:29, col:35> 'int' '<='
| | |-DeclRefExpr 0x558606d90388 <col:29> 'int' lvalue Var 0x558606d900c8 'kk' 'int'
| | `-DeclRefExpr 0x558606d903a8 <col:35> 'int' lvalue Var 0x558606d90148 'll' 'int'
| |-DeclRefExpr 0x558606d90438 <col:42> 'int' lvalue Var 0x558606d901c8 'mm' 'int'
| `-IntegerLiteral 0x558606d90458 <col:47> 'int' 4123
`-IntegerLiteral 0x558606d904c0 <col:54> 'int' 5321


Match #2:

Binding for "ternary":
ConditionalOperator 0x558606d90490 </mnt/c/temp/trivial/src/NanoFile.c:3:27, col:47> 'int'
|-BinaryOperator 0x558606d903f8 <col:29, col:35> 'int' '<='
| |-DeclRefExpr 0x558606d90388 <col:29> 'int' lvalue Var 0x558606d900c8 'kk' 'int'
| `-DeclRefExpr 0x558606d903a8 <col:35> 'int' lvalue Var 0x558606d90148 'll' 'int'
|-DeclRefExpr 0x558606d90438 <col:42> 'int' lvalue Var 0x558606d901c8 'mm' 'int'
`-IntegerLiteral 0x558606d90458 <col:47> 'int' 4123

2 matches.

Here are the ternaries are reported from clang-query:

clang-query> m conditionalOperator().bind("ternary")

Match #1:

/mnt/c/temp/trivial/src/NanoFile.c:3:16: note: "ternary" binds here
    3 |     int foo =  ii > jj  ? ( kk <= ll ) ? mm : 4123 : 5321 ;
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Match #2:

/mnt/c/temp/trivial/src/NanoFile.c:3:27: note: "ternary" binds here
    3 |     int foo =  ii > jj  ? ( kk <= ll ) ? mm : 4123 : 5321 ;
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~
2 matches.
clang-query>

Solution

  • Fixing the bug

    Your attempt was close to working. The bug is on this line in gStmtToString:

    return clang::Lexer::getSourceText(charRange, SM, LO).str();
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    

    This retrieves the original source text for the indicated range. When this code is used to get the text of a sub-expression that has already been rewritten, it effectively discards those rewrites.

    The fix is to instead ask the Rewriter for the modified code:

    return mRewriter.getRewrittenText(charRange);
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
    

    With that fix, your program should work as intended.

    Conceptual background

    Just for clarity, some of the relevant pieces in play are:

    • The SourceManager, which provides access to original source code, and maintains the data structures that allow a SourceLocation to be compactly encoded as a single 32-bit integer.

    • The Lexer, which provides a token-based view of the text the SourceManager has, and is also capable of interacting with a Preprocessor to do macro expansion, etc.

    • The Rewriter, which maintains an ordered sequence of rewrite actions to perform. You can ask it to synthesize some or all of the text that results from applying those rewritings, but it doesn't actually change anything outside its own object until you call overwriteChangedFiles.

    Post-order traversal

    For this procedure to work, it's important to override RecursiveASTVisitor::shouldTraversePostOrder() to return true so that rewrites are performed on sub-expressions before containing expressions. Otherwise, the locations of sub-expressions are "lost" because when text is replaced, the new text does not have any location information.

    For example, if the program at the end of this answer is changed to return false from shouldTraversePostOrder(), then when it rewrites the input line:

      1? 2? 3 : 4 : 5;
    

    The first replacement yields:

                   ------ extracted text --------
                  /              |               \
               "1"           "2? 3 : 4"           "5"
                V             VVVVVVVV             V
      (/*COND0*/1) ? (/*LHS0*/2? 3 : 4) : (/*RHS0*/5);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       inserted text
    

    but now that entire line, except for the semicolon, lacks source information. So when we try to rewrite the sub-expression 2? 3 : 4, the Rewriter cannot find its proper start and end locations, nor the locations of its sub-expressions (2, etc.), so the attempt yields nonsense:

                                                      --- extracted text ----
                                                     /           |           \
                                                  "2"           "3"           "4"
                                                   V             V             V
      (/*COND0*/1) ? (/*LHS0*/2? 3 : 4) :(/*COND1*/ ) ? (/*LHS1*/*) : (/*RHS1*/0)*/5);
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                 inserted text (in wrong spot)
    

    The precise extraction and insertion locations are a consequence of the way the Rewriter does its internal arithmetic under the assumption that the given source locations are still possible to locate. Since they are not, the calculations go awry. (And there could be no fix; the only alternative would be for Rewriter to signal an error.)

    In contrast, when rewriting bottom-up, the first replacement is:

                      --- extracted text ----
                     /           |           \
                  "2"           "3"           "4"
                   V             V             V
      1? (/*COND0*/2) ? (/*LHS0*/3) : (/*RHS0*/4) : 5;
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      inserted text
    

    All of the characters marked "inserted text" lack source location information, but the others still do, so when gStmtToString extracts the text for the sub-expressions, the 1 and 5 are still there, and the source range that originally contained 2? 3 : 4 now maps to the replacement text, so the rewrite yields:

                   -------------------- extracted text --------------------------
                  /                            |                                 \
               "1"                         "2? 3 : 4"                             "5"
                V             VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV             V
      (/*COND1*/1) ? (/*LHS1*/(/*COND0*/2) ? (/*LHS0*/3) : (/*RHS0*/4)) : (/*RHS1*/5);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                      inserted text
    
    

    In the code in the question (and preserved in my answer, below), shouldTraversePostOrder() has been overridden to return true, which is correct. This section is just explaining why that is necessary.

    Complete demonstration program

    The code in the question isn't quite enough to make a working program, so I ended up filling in some details and deleting others, but the essence is still here.

    // rewrite-ternary.cc
    // Attempt to rewrite ternary expressions.
    
    #include "clang/AST/ASTContext.h"                          // clang::ASTContext
    #include "clang/AST/RecursiveASTVisitor.h"                 // clang::RecursiveASTVisitor
    #include "clang/Basic/Diagnostic.h"                        // clang::DiagnosticsEngine
    #include "clang/Basic/DiagnosticOptions.h"                 // clang::DiagnosticOptions
    #include "clang/Basic/SourceLocation.h"                    // clang::SourceLocation
    #include "clang/Basic/SourceManager.h"                     // clang::SourceManager
    #include "clang/Frontend/ASTUnit.h"                        // clang::ASTUnit
    #include "clang/Frontend/CompilerInstance.h"               // clang::CompilerInstance
    #include "clang/Rewrite/Core/Rewriter.h"                   // clang::Rewriter
    #include "clang/Serialization/PCHContainerOperations.h"    // clang::PCHContainerOperations
    
    #include <iostream>                                        // std::cout
    #include <sstream>                                         // std::ostringstream
    #include <string>                                          // std::string
    
    using std::cout;
    
    int gProbeIndex = 0;
    
    // Experimental recursive visitor class.
    class MyVisitor : public clang::RecursiveASTVisitor<MyVisitor> {
    public:
        explicit MyVisitor(
            clang::ASTContext& rContext,
            clang::Rewriter& rRewriter)
            : mContext{rContext}
            , mRewriter{rRewriter}
        {}
    
        // default behavior is to traverse the AST in pre-order (override to true to force post-order).
        // @JC note that since this uses CRTP pattern (i.e. class Derived : public Base<Derived>),
        // the method is not virtual & bypasses the need for a VTable - very clever!
        bool shouldTraversePostOrder() const {
            return true;
        }
    
        std::string gStmtToString(const clang::Stmt* stmt) const {
            const auto& SM = mContext.getSourceManager();
            const auto& LO = mContext.getLangOpts();
            const auto startLoc = stmt->getBeginLoc();
            const auto endLoc = clang::Lexer::getLocForEndOfToken(
                stmt->getEndLoc(), 0, SM, LO);
            if (SM.isWrittenInSameFile(startLoc, endLoc)) {
                const auto charRange = clang::CharSourceRange::getCharRange(startLoc, endLoc);
    
                // This is the key fix: get the text *after* taking prior
                // rewrites into account.
                return mRewriter.getRewrittenText(charRange);
    
                // This was the old code.  It simply reads the original
                // text, and consequently, any rewrites that had been
                // performed within the indicated range were lost.
                //return clang::Lexer::getSourceText(charRange, SM, LO).str();
            }
            return {};
        };
    
        //! Visitor pattern callback for 'ConditionalOperator'.
        bool VisitConditionalOperator(clang::ConditionalOperator *CO) const {
            // This method is called for every 'ConditionalOperator' in the code.
            // You can examine 'CO' to extract information about it.
            const auto& SM = mContext.getSourceManager();
            const auto sourceRange = CO->getSourceRange();
    
            // Diagnostics.
            cout << "ternary range: " << sourceRange.printToString(SM) << "\n";
            cout << "gStmtToString: " << gStmtToString(CO) << "\n";
    
            // Get the text of the sub-expressions, taking into account the
            // rewrites already performed on any of *their* sub-expressions.
            auto cond = gStmtToString(CO->getCond());
            auto lhs = gStmtToString(CO->getLHS());
            auto rhs = gStmtToString(CO->getRHS());
    
            // Construct the instrumented code that will replace the
            // original expression 'CO'.
            std::string probeText;
            {
                // I'm not set up for c++20 at the moment, so use
                // std::ostringstream as an alternative to std::format.
                std::ostringstream oss;
                oss << "(/*COND" << gProbeIndex << "*/" << cond
                    << ") ? (/*LHS" << gProbeIndex << "*/" << lhs
                    << ") : (/*RHS" << gProbeIndex << "*/" << rhs
                    << ")";
                probeText = oss.str();
                ++gProbeIndex;
            }
            mRewriter.ReplaceText(sourceRange, probeText);
    
            // Get the entire current rewrite buffer.
            std::string str;
            {
                llvm::raw_string_ostream rso(str);
                clang::RewriteBuffer &RB = mRewriter.getEditBuffer(SM.getMainFileID());
                RB.write(rso);
                rso.flush();
            }
    
            // Print it for diagnostic purposes.
            cout << "---- BEGIN RB ----\n";
            cout << str;
            cout << "---- END RB ----\n";
    
            // returning false aborts the traversal
            return true;
        }
    
    private:
        clang::ASTContext& mContext;
        clang::Rewriter& mRewriter;
    };
    
    
    // This is all boilerplate for a program using the Clang C++ API
    // ("libtooling") but not using the "tooling" part specifically.
    int main(int argc, char const **argv)
    {
      // Copy the arguments into a vector of char pointers since that is
      // what 'createInvocationFromCommandLine' wants.
      std::vector<char const *> commandLine;
      {
        // Path to the 'clang' binary that I am behaving like.  This path is
        // used to compute the location of compiler headers like stddef.h.
        // The Makefile sets 'CLANG_LLVM_INSTALL_DIR' on the compilation
        // command line.
        commandLine.push_back(CLANG_LLVM_INSTALL_DIR "/bin/clang");
    
        for (int i = 1; i < argc; ++i) {
          commandLine.push_back(argv[i]);
        }
      }
    
      // Parse the command line options.
      std::shared_ptr<clang::CompilerInvocation> compilerInvocation(
        clang::createInvocation(llvm::ArrayRef(commandLine)));
      if (!compilerInvocation) {
        // Command line parsing errors have already been printed.
        return 2;
      }
    
      // Boilerplate setup for 'LoadFromCompilerInvocationAction'.
      std::shared_ptr<clang::PCHContainerOperations> pchContainerOps(
        new clang::PCHContainerOperations());
      clang::IntrusiveRefCntPtr<clang::DiagnosticsEngine> diagnosticsEngine(
        clang::CompilerInstance::createDiagnostics(
          new clang::DiagnosticOptions));
    
      // Run the Clang parser to produce an AST.
      std::unique_ptr<clang::ASTUnit> ast(
        clang::ASTUnit::LoadFromCompilerInvocationAction(
          compilerInvocation,
          pchContainerOps,
          diagnosticsEngine));
    
      if (ast == nullptr ||
          diagnosticsEngine->getNumErrors() > 0) {
        // Error messages have already been printed.
        return 2;
      }
    
      clang::ASTContext &astContext = ast->getASTContext();
    
      clang::Rewriter rewriter(astContext.getSourceManager(),
                               astContext.getLangOpts());
    
      MyVisitor visitor(astContext, rewriter);
      visitor.TraverseDecl(astContext.getTranslationUnitDecl());
    
      return 0;
    }
    
    
    // EOF
    
    # Makefile
    
    # Default target.
    all:
    .PHONY: all
    
    
    # ---- Configuration ----
    # Installation directory from a binary distribution.
    # Has five subdirectories: bin include lib libexec share.
    CLANG_LLVM_INSTALL_DIR = $(HOME)/opt/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04
    
    # ---- llvm-config query results ----
    # Program to query the various LLVM configuration options.
    LLVM_CONFIG := $(CLANG_LLVM_INSTALL_DIR)/bin/llvm-config
    
    # C++ compiler options to ensure ABI compatibility.
    LLVM_CXXFLAGS := $(shell $(LLVM_CONFIG) --cxxflags)
    
    # Directory containing the clang library files, both static and dynamic.
    LLVM_LIBDIR := $(shell $(LLVM_CONFIG) --libdir)
    
    # Other flags needed for linking, whether statically or dynamically.
    LLVM_LDFLAGS_AND_SYSTEM_LIBS := $(shell $(LLVM_CONFIG) --ldflags --system-libs)
    
    
    # ---- Compiler options ----
    # C++ compiler.
    CXX := $(CLANG_LLVM_INSTALL_DIR)/bin/clang++
    
    # Compiler options, including preprocessor options.
    CXXFLAGS =
    CXXFLAGS += -g
    CXXFLAGS += -Wall
    CXXFLAGS += -Werror
    
    # Get llvm compilation flags.
    CXXFLAGS += $(LLVM_CXXFLAGS)
    
    # Tell the source code where the clang installation directory is.
    CXXFLAGS += -DCLANG_LLVM_INSTALL_DIR='"$(CLANG_LLVM_INSTALL_DIR)"'
    
    # Linker options.
    LDFLAGS =
    
    LDFLAGS += -g -Wall
    
    # Pull in clang+llvm via libclang-cpp.so, which has everything, but is
    # only available as a dynamic library.
    LDFLAGS += -lclang-cpp
    
    # Arrange for the compiled binary to search the libdir for that library.
    # Otherwise, one can set the LD_LIBRARY_PATH envvar before running it.
    # Note: the -rpath switch does not work on Windows.
    LDFLAGS += -Wl,-rpath=$(LLVM_LIBDIR)
    
    # Get the needed -L search path, plus things like -ldl.
    LDFLAGS += $(LLVM_LDFLAGS_AND_SYSTEM_LIBS)
    
    
    # ---- Recipes ----
    # Compile a C++ source file.
    %.o: %.cc
        $(CXX) -c -o $@ $(CXXFLAGS) $<
    
    # Executable.
    all: rewrite-ternary.exe
    rewrite-ternary.exe: rewrite-ternary.o
        $(CXX) -o $@ $^ $(LDFLAGS)
    
    # Test.
    .PHONY: check
    check: rewrite-ternary.exe
        ./rewrite-ternary.exe -w test.cc
    
    .PHONY: clean
    clean:
        $(RM) *.o *.exe
    
    
    # EOF
    

    Test input:

    void f()
    {
      1? 2? 3 : 4 : 5;
    }
    

    Output:

    ternary range: <test.cc:3:6, col:13>
    gStmtToString: 2? 3 : 4
    ---- BEGIN RB ----
    void f()
    {
      1? (/*COND0*/2) ? (/*LHS0*/3) : (/*RHS0*/4) : 5;
    }
    ---- END RB ----
    ternary range: <test.cc:3:3, col:17>
    gStmtToString: 1? (/*COND0*/2) ? (/*LHS0*/3) : (/*RHS0*/4) : 5
    ---- BEGIN RB ----
    void f()
    {
      (/*COND1*/1) ? (/*LHS1*/(/*COND0*/2) ? (/*LHS0*/3) : (/*RHS0*/4)) : (/*RHS1*/5);
    }
    ---- END RB ----
    

    Second example:

    void f(int ii, int jj, int kk, int ll, int mm)
    {
      int foo = ii > jj ? ( kk <= ll ) ? mm : 4123 : 5321 ;
    }
    

    Output:

    ternary range: <test.cc:3:23, col:43>
    gStmtToString: ( kk <= ll ) ? mm : 4123
    ---- BEGIN RB ----
    void f(int ii, int jj, int kk, int ll, int mm)
    {
      int foo = ii > jj ? (/*COND0*/( kk <= ll )) ? (/*LHS0*/mm) : (/*RHS0*/4123) : 5321 ;
    }
    ---- END RB ----
    ternary range: <test.cc:3:13, col:50>
    gStmtToString: ii > jj ? (/*COND0*/( kk <= ll )) ? (/*LHS0*/mm) : (/*RHS0*/4123) : 5321
    ---- BEGIN RB ----
    void f(int ii, int jj, int kk, int ll, int mm)
    {
      int foo = (/*COND1*/ii > jj) ? (/*LHS1*/(/*COND0*/( kk <= ll )) ? (/*LHS0*/mm) : (/*RHS0*/4123)) : (/*RHS1*/5321) ;
    }
    ---- END RB ----