Search code examples
clangabstract-syntax-tree

How do I tell a clang AST tool where to find stddef.h?


I have a rudimentary clang AST visitor (constructed with the help of chatGPT)

#include <iostream>
#include <llvm/Support/CommandLine.h>
#include <clang/AST/ASTConsumer.h>
#include <clang/ASTMatchers/ASTMatchers.h>
#include <clang/ASTMatchers/ASTMatchFinder.h>
#include <clang/AST/RecordLayout.h>
#include <clang/AST/RecursiveASTVisitor.h>
#include <clang/Frontend/CompilerInstance.h>
#include <clang/Frontend/FrontendAction.h>
#include <clang/Frontend/FrontendActions.h>
#include <clang/Rewrite/Frontend/FrontendActions.h>
#include <clang/Tooling/CommonOptionsParser.h>
#include <clang/Tooling/Tooling.h>

using namespace clang;
using namespace clang::ast_matchers;
using namespace clang::tooling;

static llvm::cl::OptionCategory MyToolCategory("my-tool options");

class OffsetOfVisitor : public RecursiveASTVisitor<OffsetOfVisitor> {
public:
    explicit OffsetOfVisitor(ASTContext *Context)
        : Context(Context) {}

    bool VisitFieldDecl(FieldDecl *FD) {
        const RecordDecl *Parent = FD->getParent();

        std::string FieldName = FD->getNameAsString();
        if (FieldName.empty()) {
            return true;
        }

        std::string ParentName;
        if (const TypedefNameDecl *TND = Parent->getTypedefNameForAnonDecl()) {
            ParentName = TND->getNameAsString();
        }

        if (ParentName.empty()) {
            return true;
        }

        llvm::outs() << ParentName << ' ' << FieldName << "\n";

        return true;
    }

private:
    ASTContext *Context;
};

class OffsetOfConsumer : public ASTConsumer {
public:
    explicit OffsetOfConsumer(ASTContext *Context)
        : Visitor(Context) {}

    void HandleTranslationUnit(ASTContext &Context) override {
        Visitor.TraverseDecl(Context.getTranslationUnitDecl());
    }

private:
    OffsetOfVisitor Visitor;
};

class OffsetOfAction : public ASTFrontendAction {
public:
    std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI, StringRef File) override {
        return std::make_unique<OffsetOfConsumer>(&CI.getASTContext());
    }
};

int main(int argc, const char **argv) {
    auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
    if (!ExpectedParser) {
        return 1;
    }

    clang::tooling::CommonOptionsParser &OptionsParser = ExpectedParser.get();
    clang::tooling::ClangTool Tool(OptionsParser.getCompilations(), OptionsParser.getSourcePathList());

    int result = Tool.run(newFrontendActionFactory<OffsetOfAction>().get());

    return result;
}

// vim: et ts=4 sw=4

but when I run the tool, it fails to find some dependent include files:

# ../visitor t2.cpp
In file included from /home/pjoot/vv/test/t2.cpp:1:
/usr/include/stdio.h:33:10: fatal error: 'stddef.h' file not found
   33 | #include <stddef.h>
      |          ^~~~~~~~~~
__fsid_t __val
__mbstate_t __count
__mbstate_t __value
1 error generated.
Error while processing /home/pjoot/vv/test/t2.cpp.

If I run the clang preprocessor, I see that stddef.h is found in a clang specific path:

# 28 "/usr/include/stdio.h" 2 3 4

extern "C" {

# 1 "/usr/bin/../lib/clang/17/include/stddef.h" 1 3 4

so I can work around the failure by invoking the tool with additional include paths, like so:

../visitor t2.cpp -- -I /usr/bin/../lib/clang/17/include

Two questions:

  1. How can I programatically add this additional -I path to the tool, so that every invocation doesn't have to pass it?
  2. Better: Is there a way to have the clang AST visitor automatically use all the same include paths that clang itself would use?

EDIT:

Scott's answer https://stackoverflow.com/a/79049157/189270 works. To illustrate, I added a make rule to copy my tool and all the dependent header files into an appropriate directory structure:

> make install
rm -rf tool
mkdir -p tool/bin tool/lib/clang/17
install visitor tool/bin/
(cd tool/lib/clang/17 && ln -s /usr/bin/../lib/clang/17/include)

> find tool -name stddef.h
tool/lib/clang/17/include/stddef.h

> cd test

> ../tool/bin/visitor t2.cpp
__fsid_t __val
__mbstate_t __count
__mbstate_t __value

I've used a symlink instead of copy to avoid copying 200+ header files. Hardcoding the internal clang paths in my makefile is a bit ugly, but I can live with that.


Solution

  • Although phrased quite differently, the question What's the difference between Clang invoked by Bash and the ClangTool? is getting at the same underlying issue, and its answer (written by me) is relevant.

    I'm not sure if this should be considered a duplicate question (in part because the other one has a somewhat misleading title), so for now I'll just quote the key parts of the answer:

    Parsing C++ requires more than just a program that can read C++ syntax. It also requires certain header files that are logically part of the compiler rather than the C library; stddef.h is one of the former, while (say) stdio.h is one of the latter. If you use LibTooling to create an executable that parses C++, it is incomplete without the compiler headers, like a video game executable missing its art assets.

    And:

    The proper way to do that is for your tool to have an "install" step or similar that copies the Clang compiler headers along with the executable to a location from which the latter will run. You need to copy every file in lib/clang/$(version)/include.

    The linked question was later edited to add a CMakelists.txt fragment that does that copying, although I have not tested it.

    Once the proper headers are installed alongside your executable, it will be able to find and use them during parsing.

    The linked answer (and the several other similar Q+As it links to) also suggests ways of using the clang compiler headers without copying them, but that is, at best, a hack that is only useful during tool development.