Search code examples
refactoringcode-analysisdoxygengraphvizcall-graph

Merging C Callergraphs with Doxygen or determining union of all calls


I have a collection of legacy C code which I'm refactoring to split the C computational code from the GUI. This is complicated by the heavily recursive mathematical core code being K&R style declarations. I've already abandoned an attempt to convert these to ANSI declarations due to nested use of function parameters (just couldn't get those last 4 compiler errors to go).

I need to move some files into a pure DLL and determine the minimal interface to make public, which is going to require wrapper functions writing to publish a typed interface.

I've marked up the key source files with the Doxygen @callergraph markup so informative graphs are generated for individual functions. What I'd like to do beyond that is amalgamate these graphs so I can determine the narrowest boundary of functions exposed to the outside world.

The original header files are no use - they expose everything as untyped C functions.

There are hundreds of functions so simple inspection of the generated callergraphs is painful.

I'm considering writing some kind of DOT merge tool - setting DOT_CLEANUP=NO makes Doxygen leave the intermediate DOT files there rather then just retaining the png files they generated.

I'm not obsessed by this being a graphical solution - I'd be quite happy if someone could suggest an alternative analysis tool (free or relatively cheap) or technique using Doxygen's XML output to achieve the same goal.

A callergraph amalgamated at the file level does have a certain appeal for client documentation rather than a plain list :-)


Solution

  • In your Doxyfile, set

    GENERATE_XML = YES
    

    and then you can find your call graph in the XML files. For each function marked with the callergraph, you'll find <referencedby> elements in the output that you can use. Here's a sample from one of my C files:

      <memberdef kind="function" id="spfs_8c_1a3"
                 prot="public" static="yes" const="no" explicit="no"
                 inline="no" virt="non-virtual">
        <type>svn_error_t *</type>
        <definition>svn_error_t * init_apr</definition>
        <argsstring>(apr_pool_t **ppool, int *argc, char const *const **argv)</argsstring>
        <name>init_apr</name>
        <!-- param and description elements clipped for brevity ... -->
        <location file="src/spfs.c" line="26" bodystart="101" bodyend="120"/>
        <referencedby refid="spfs_8c_1a13" compoundref="spfs_8c"
                      startline="45" endline="94">main</referencedby>
      </memberdef>
    

    So for every memberdef/referencedby pair, you have a caller-callee relationship, which you can grab via XSLT:

    <?xml version="1.0" encoding="utf-8"?>
    
    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
        <xsl:output method="text"/>
    
        <xsl:template match="/">
            <xsl:apply-templates select="//memberdef[@kind eq 'function']"/>
        </xsl:template>
    
        <xsl:template match="memberdef">
            <xsl:variable name="function-name"
                          select="concat(definition, argsstring)"/>
            <xsl:for-each select="referencedby">
                <xsl:value-of select="concat(./text(), ' calls ', $function-name, '&#xA;')"/>
            </xsl:for-each>
        </xsl:template>
    
    </xsl:stylesheet>
    

    Which gives you a line per caller-callee like this:

    main calls svn_error_t * init_apr(apr_pool_t **ppool, int *argc, char const *const **argv)
    

    You'll want to tweak that XSLT and then partition that directed graph in the way that cuts across the fewest edges. Woo hoo, an NP-complete problem! Luckily, there are lots of papers on the subject, some are here: http://www.sandia.gov/~bahendr/partitioning.html