refactoring code-analysis doxygen graphviz call-graph

Merging C Callergraphs with Doxygen or determining union of all calls

I have a collection of legacy C code which I'm refactoring to split the C computational code from the GUI. This is complicated by the heavily recursive mathematical core code being K&R style declarations. I've already abandoned an attempt to convert these to ANSI declarations due to nested use of function parameters (just couldn't get those last 4 compiler errors to go).

I need to move some files into a pure DLL and determine the minimal interface to make public, which is going to require wrapper functions writing to publish a typed interface.

I've marked up the key source files with the Doxygen @callergraph markup so informative graphs are generated for individual functions. What I'd like to do beyond that is amalgamate these graphs so I can determine the narrowest boundary of functions exposed to the outside world.

The original header files are no use - they expose everything as untyped C functions.

There are hundreds of functions so simple inspection of the generated callergraphs is painful.

I'm considering writing some kind of DOT merge tool - setting DOT_CLEANUP=NO makes Doxygen leave the intermediate DOT files there rather then just retaining the png files they generated.

I'm not obsessed by this being a graphical solution - I'd be quite happy if someone could suggest an alternative analysis tool (free or relatively cheap) or technique using Doxygen's XML output to achieve the same goal.

A callergraph amalgamated at the file level does have a certain appeal for client documentation rather than a plain list :-)

Solution

In your Doxyfile, set

GENERATE_XML = YES

and then you can find your call graph in the XML files. For each function marked with the callergraph, you'll find <referencedby> elements in the output that you can use. Here's a sample from one of my C files:

  <memberdef kind="function" id="spfs_8c_1a3"
             prot="public" static="yes" const="no" explicit="no"
             inline="no" virt="non-virtual">
    <type>svn_error_t *</type>
    <definition>svn_error_t * init_apr</definition>
    <argsstring>(apr_pool_t **ppool, int *argc, char const *const **argv)</argsstring>
    <name>init_apr</name>
    <!-- param and description elements clipped for brevity ... -->
    <location file="src/spfs.c" line="26" bodystart="101" bodyend="120"/>
    <referencedby refid="spfs_8c_1a13" compoundref="spfs_8c"
                  startline="45" endline="94">main</referencedby>
  </memberdef>

So for every memberdef/referencedby pair, you have a caller-callee relationship, which you can grab via XSLT:

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="text"/>

    <xsl:template match="/">
        <xsl:apply-templates select="//memberdef[@kind eq 'function']"/>
    </xsl:template>

    <xsl:template match="memberdef">
        <xsl:variable name="function-name"
                      select="concat(definition, argsstring)"/>
        <xsl:for-each select="referencedby">
            <xsl:value-of select="concat(./text(), ' calls ', $function-name, '&#xA;')"/>
        </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

Which gives you a line per caller-callee like this:

main calls svn_error_t * init_apr(apr_pool_t **ppool, int *argc, char const *const **argv)

You'll want to tweak that XSLT and then partition that directed graph in the way that cuts across the fewest edges. Woo hoo, an NP-complete problem! Luckily, there are lots of papers on the subject, some are here: http://www.sandia.gov/~bahendr/partitioning.html