Search code examples
c#roslyn-code-analysislines-of-code

How do I calculate the code-to-comment ratio of a C# project?


Note: I am not asking what the golden code-to-comment ratio is, nor am I attempting to affix a particular ratio to our team. Instead, we would like to improve how well our codebase is documented (we started with a "code should document itself" mentality), which can be accomplished either by removing dead code or by adding comments to live code, and we would like to measure how well we are going about doing that by measuring this ratio multiple times over the course of several months. Also note that I would like to actually measure the amount of comments we have, so something that gets LOC from the generated IL won't work.

How would I go about getting the code-to-comments ratio for a C# project? Would I need to write my own parsing script, or is there something in Roslyn I can leverage? Do any major IDEs carry this functionality directly? As a bonus, can I filter out "punctuation", such as extra whitespace, comment delimiters (// and /* */), and opening/closing curly brackets?


Solution

  • Using Robert Harvey's regex, I managed to create a short C# method that calculates this metric from an input string. It goes character by character in order to properly account for lines that have both code and comments on them, and also excludes additional whitespace from the metric so that things like line indentations don't count.

    To prevent catastrophic backtracking, I simplified the regex (I found you don't need the newline checks, since character exclude groups already take care of those) and also made the body of the block comment a non-backtracking group.

    public static double CodeToCommentRatio(
        string text, 
        out int codeChars, 
        out int commentChars, 
        out int blankChars)
    {
        // First, filter out excess whitespace, reporting the number of characters removed this way
        Regex lineStartRegex = new Regex(@"(^|\n)[ \t]+");
        Regex blanksRegex = new Regex(@"[ \t]+");
        string minWhitespaceText = blanksRegex.Replace(lineStartRegex.Replace(text, "\n"), " ");
        blankChars = text.Length - minWhitespaceText.Length;
    
        // Then, match all comments and report the number of characters in comments
        Regex commentsRegex = new Regex(@"(/\*(?>[^*]|(\*+[^*/]))*\*+/)|(//.*)");
        MatchCollection comments = commentsRegex.Matches(minWhitespaceText);
        commentChars = 0;
        foreach (Match comment in comments)
        {
            commentChars += comment.Length;
        }
        codeChars = minWhitespaceText.Length - commentChars;
    
        // Finally, return the ratio
        return (double)codeChars / commentChars;
    }