Search code examples
c++commentslibclang

How to extract comments and match to declaration with RecursiveASTVisitor in libclang c++?


I am writing a utility which is supposed to parse C++ (and C) header files, extract the structs, enums, fields etc. and generate code in other languages based on the extracted information. I decided to use libclang for this.

I'm using a RecursiveASTVisitor and it seems I'm able to extract all the information I need, except for comments.

I want to have the comment which appears right above every declaration (field, struct, class, enum) read, and add its text when I generate the code in other languages.

The problem is that all the samples I saw which use comments use CxCursor and the C interface for clang, and I have no idea how to get the CxCursor in my context.

So - how can I extract comments while still using RecursiveASTVisitor?


Solution

  • With some more digging up, I found this:

    For any relevant visited Decl (VisitXXXDecl), I can do this:

    virtual bool VisitDecl(Decl* d)
    {
        ASTContext& ctx = d->getASTContext();
        SourceManager& sm = ctx.getSourceManager();
    
        const RawComment* rc = d->getASTContext().getRawCommentForDeclNoCache(d);
        if (rc)
        {
            //Found comment!
            SourceRange range = rc->getSourceRange();
    
            PresumedLoc startPos = sm.getPresumedLoc(range.getBegin());
            PresumedLoc endPos = sm.getPresumedLoc(range.getEnd());
    
            std::string raw = rc->getRawText(sm);
            std::string brief = rc->getBriefText(ctx);
    
            // ... Do something with positions or comments
        }
    
        // ...
    }
    

    Note that this identifies (as far as I could see...) comments which are in the line(s) above (and adjacent!) to the current declaration in the code, and which are in one of the following formats:

    • /// Comment
    • /** Comment */
    • //! Comment

    For example, in the following case:

    /// A field with a long long comment
    /// A two-liner
    long long LongLongData;
    

    raw will be:

    /// A field with a long long comment
        /// A two-liner
    

    And brief will be:

    A field with a long long comment A two-liner
    

    Either way, it's good enough for my needs.