Search code examples
c++cmakegraphviz

Callgraphs using GraphViz with CMake and Clang


My goal is to generate call graphs using CMake + Clang + GraphViz at build time.

Using these [1, 2] processes I can create simple graphs. But, I'm not sure how to generalise the process to a CMake project.

I have an executable target.

add_executable(${TARGET} ${SOURCES})

Which from within a macro, I add the graph relevant options to the target:

target_compile_options(${TARGET} PRIVATE -S -emit-llvm)

And, add an addtional post build command which generates the call graphs:

add_custom_command(
    TARGET ${TARGET}
    POST_BUILD
    COMMENT "Running clang OPT"
    COMMAND opt -analyze -dot-callgraph
)

But the clang attempts to create an executable for the target. This results in this error:

[build] lld-link: error: 
Container.test.cpp.obj: unknown file type

I also don't understand how any custom command (opt for example) would access the produced LLVM representation. It doesn't look like my custom command has any knowledge of the relevant files (even if the above error was fixed).


What I understand so far:

  1. CMake add_executable adds the -o outfile.exe argument to clang, this prevents me from doing the same steps shown in the linked processes [1, 2]
  2. $<TARGET_FILE:${TARGET}> can be used to find the produced files from clang, but I don't know if this works for LLVM representation.
  3. I've tried doing a custom target instead, but had issues getting all the TARGET sources with all the settings into the custom target.
  4. The process outlined here [3] might be relevant specially -Wl,-save-temps but this seems to be a pretty roundabout way to get IR (using llvm-dis).
  5. The unknown file type error is due to the object actually being LLVM representation, but I suspect the linker expects a different format.
  6. To get the linker to understand LLVM representation, add -flto to the linker options target_link_options(${TARGET} PRIVATE -flto), (source [4]). This is awesome, because it means I've almost solved this... I just don't know how to get the path to the produced bitcode output files in cmake, once I do, I can pass them to opt (I hope...).
  7. To get the target objects the following cmake command can be used $<TARGET_OBJECTS:${TARGET}> in the case of cmake this will list the .o (Is the .o because of a rename by cmake?) LLVM bitcode files.
  8. The .o file in this case is bitcode, however the opt tool appears to only a llvm representation. To convert to this llvm-dis bitcode.bc –o llvm_asm.ll. Due to cross compilation I believe the mangled symbol are of a strange format. Passing them into llvm-cxxfilt does not succeed, for example llvm-cxxfilt --no-strip-underscore --types ?streamReconstructedExpression@?$BinaryExpr@AEBV?$reverse_iterator@PEBD@std@@AEBV12@@Catch@@EEBAXAEAV?$basic_ostream@DU?$char_traits@D@std@@@std@@@Z
  9. So addressing 8. this is a MSVC name mangling format. This indicates that when compiling on windows clang uses the MSVC format name mangling. A surprise to me... (source [5]).
  10. LLVM ships with llvm-undname it is able to demangle the symbols. This tool when I run it errors significantly when I give it raw input, it seems to only work with correct symbols. The tool demumble appears to be a cross platform, multi-format wrapper of llvm-undname and llvm-cxxfilt.

11.My almost working cmake macro is as follows:

macro (add_clang_callgraph TARGET)
    if(CALLGRAPH)
        target_compile_options(${TARGET} PRIVATE -emit-llvm)
        target_link_options(${TARGET} PRIVATE -flto)
        
        foreach (FILE $<TARGET_OBJECTS:${TARGET}>)
            add_custom_command(
                TARGET ${TARGET}
                POST_BUILD
                COMMAND llvm-dis ${FILE}
                COMMAND opt -dot-callgraph ${FILE}.ll
                COMMAND demumble ${FILE}.ll.callgraph.dot > ${FILE}.dot
            )
        endforeach()
    endif()
endmacro()

However, this doesn't work... The contents of ${FILE} is always the entire list...

This is still the case here:

foreach (FILE IN LISTS $<TARGET_OBJECTS:${TARGET}>)
    add_custom_command(
        TARGET ${TARGET}
        POST_BUILD
        COMMAND echo ${FILE}
    )
endforeach()

The result looks like:

thinga.obj;thingb.obj

This is because CMake doesn't evaluate the generator expression until AFTER the for loop is evaluated. Meaning, there is only one loop here and it contains the generator expression (not a resolved generator expression) (source [6]). This means I cannot loop through object files and create a series of custom commands for each object file.


I'll add to the above as I find things out, If I figure out the whole process I'll post a solution.

Any help would be greatly appreciated, this has been a great pain in the arse.


What I'm hoping for, a way to make CMake accept building an executable to a single LLVM representation file, using that file with opt to get the callgraph and then finishing the compilation with llc. I'm a little constrained though, as I'm cross compiling. Ultimately anything equivlient will do...


Solution

  • I'll attempt an answer just to gather all my comment responses so far.

    If you want to "subvert" CMake, it can be done with something like this (adapted from here out of OP's point 4 above):

    cmake_minimum_required(VERSION 3.0.2)
    
    project(hello)
    
    set(CMAKE_C_COMPILER clang)
    set(CMAKE_EXE_LINKER_FLAGS ${CMAKE_EXE_LINKER_FLAGS} "-flto")
    
    add_executable(hello main.c hello.c)
    
    # decide your bitcode generation method here
    # target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -emit-llvm)
    target_compile_options(hello PUBLIC ${CMAKE_C_FLAGS} -c -flto)
    
    # this is just to print
    add_custom_target(print_hello_objs 
      COMMAND ${CMAKE_COMMAND} -E echo $<JOIN:$<TARGET_OBJECTS:hello>," ">)
    
    # this does some linking
    # fill in details here as you need them (e.g., name, location, etc.)
    add_custom_target(link_hello_objs 
      COMMAND llvm-link -o foo.bc $<TARGET_OBJECTS:hello> 
      COMMAND_EXPAND_LISTS)
    

    For uses where processing on each file is required, the COMMAND can be an external script (bash/python) that just takes that list and generates the .dot files. The problem with generator expressions is that they are not evaluated till generation time in CMake and not in a foreach context.

    If you want to trigger regeneration based on what object/bitcode file is recompiled, things get tricky since CMake has preset ways to invoke the components of a toolchain (compiler, link, etc.), hence why I wrote my CMake-based project back then, but I'd strongly recommend avoiding overengineering at the start since it sounds as if you're not sure what you're up against yet.

    I haven't bothered with making LTO work fully, in order to also get a working executable since I don't have such a setup on this machine ATM.

    All the other requirements (e.g., Graphviz output, demangling) can be hooked up with further custom targets/commands.

    Other solutions might be:

    1. gllvm
    2. for the desperate llvm-ir-cmake-utils