Search code examples
c++clangclang-ast-matchers

What's the difference between Clang invoked by Bash and the ClangTool?


The original question

I followed a tutorial and created a tool named ast_extract to analyze the source code of libpng:

llvm::cl::OptionCategory ASTExtractCategory("ASTExtract tool options");
llvm::Expected<clang::tooling::CommonOptionsParser> optionParser = clang::tooling::CommonOptionsParser::create(argc, argv, ASTExtractCategory);
clang::tooling::ClangTool tool(optionParser->getCompilations(), optionParser->getCompilations().getAllFiles());

However, the tool encounters an issue where it cannot locate the 'stddef.h' file, as indicated in the following error message:

1 error generated.
Error while processing /data/tfk/study/PUT/libpng/pngmem.c.
In file included from contrib/libtests/pngvalid.c:27:
/usr/include/signal.h:301:11: fatal error: 'stddef.h' file not found
  301 | # include <stddef.h>
      |           ^~~~~~~~~~

I can temporarily resolve this by adding an extra parameter, such as ast_extract --extra-arg='-I/data/tfk/llvm/lib/clang/17/include'. However, when using Bash, I can simply copy the command lines from compile_commands.json without encountering any issues:

clang -c -DHAVE_CONFIG_H -I. -g -O2 -fPIC -DPIC -o .libs/pngmem.o pngmem.c

How can I modify the ast_extract tool to eliminate the need for the extra argument?

Update

Thanks to scott! I've successfully implemented the mentioned first solution by incorporating the following rules into CMakeLists.txt

# Install the compiler header files that libtool cannot find (such as stddef.h)
set(CMAKE_INSTALL_PREFIX ${CMAKE_CURRENT_SOURCE_DIR})
install(TARGETS ast_extract DESTINATION bin)
install(DIRECTORY ${LLVM_INCLUDE_DIR}/../lib/clang/${LLVM_VERSION_MAJOR}/include DESTINATION lib/clang/${LLVM_VERSION_MAJOR})

message(STATUS "LLVM_INCLUDE_DIRS: ${LLVM_INCLUDE_DIRS}")
message(STATUS "CLANG_INCLUDE_DIRS: ${CLANG_INCLUDE_DIRS}")

Despite my best efforts to minimize hard-coding, the path to stddef.h (${LLVM_INCLUDE_DIR}/../lib/clang/${LLVM_VERSION_MAJOR}/include) appears somewhat cumbersome.

I'm curious if there's a CMake variable that directly corresponds to the needed path?


Solution

  • The difference, in a nutshell, is that the clang invoked by bash has its compiler headers (including stddef.h) nearby in the file system, while your LibTooling executable does not.

    Parsing C++ requires more than just a program that can read C++ syntax. It also requires certain header files that are logically part of the compiler rather than the C library; stddef.h is one of the former, while (say) stdio.h is one of the latter. If you use LibTooling to create an executable that parses C++, it is incomplete without the compiler headers, like a video game executable missing its art assets. (And like a video game could package its art into the executable, a LibTooling executable could have the compiler headers embedded into it, but the LibTooling API is not set up to do that easily, unfortunately.)

    So how does a LibTooling program find its compiler headers? There are several APIs that accept an argv-like array, including the one used in your code, clang::tooling::CommonOptionsParser::create. These APIs arrange to look for the compiler headers relative to the argv[0] in that array, in, essentially, $(argv[0])/../lib/clang/$(version)/include. Consequently, you need to ensure that directory exists and is populated with all of the required files.

    Solution: The proper way to do that is for your tool to have an "install" step or similar that copies the Clang compiler headers along with the executable to a location from which the latter will run. You need to copy every file in lib/clang/$(version)/include.

    However, since that's somewhat annoying, what I typically do is have my Makefile pass a -D switch that gives the path to the Clang installation directory I'm using, and then in my C++ code, I simply replace the argv[0] that was passed in by the shell with $(clang_install_dir)/bin/clang, effectively lying to the Clang API to impersonate clang itself. Then the API will find the same compiler headers that clang finds. This of course means my tool only works when a compatible version of clang is installed in that location, so it's suitable for experimentation and one-off development but a tool shouldn't do that if it's going to be installed and used by others. A complete example is in my print-clang-ast tool.

    For reference, there are several other questions and discussions that pertain to this topic, although none have a question and sufficiently complete answer that make it suitable as a duplicate target:

    Some of those provide different workarounds, including passing an additional -I argument, which works as you've noted, but is not a good solution in general.