Search code examples
c++clanglibclang

Libclang c++ cannot parse nested template field declaration


We want to parse the following source file:

#pragma once

#include "Vec3.h"

template<typename T> struct Array;

struct S1
{
    int S1i;
    Array<Array<Vec3>> S1Grid;
};

struct S2
{
    int S2i;
    Array<Array<Vec3> > S2Grid;
};

struct S3
{
    int S3i;
    Array<Array<char>> S3Grid;
};

Using the following parser code:

#include <clang-c/Index.h>
#include <string>

static CXChildVisitResult CursorVisitorTest(CXCursor cursor, CXCursor parent, CXClientData client_data)
{
    CXCursorKind Kind = clang_getCursorKind(cursor);
    printf("%d %s\n", Kind, clang_getCString(clang_getCursorSpelling(cursor)));
    return CXChildVisit_Recurse;
}

void Test()
{
    CXIndex index = clang_createIndex(0, 0);
    std::string header_path = "Example.h";
    CXTranslationUnit TranslationUnit;
    static const char* args[] = { "-std=c++17", "-xc++", "-DHEADER_TOOL" };
    CXErrorCode error = clang_parseTranslationUnit2(
        index,
        header_path.c_str(),
        args,
        3,
        nullptr,
        0,
        CXTranslationUnit_SingleFileParse,
        &TranslationUnit
    );
    if (error == CXError_Success && TranslationUnit != nullptr)
    {
        CXCursor cursor = clang_getTranslationUnitCursor(TranslationUnit);
        clang_visitChildren(cursor, &CursorVisitorTest, nullptr);
        clang_disposeTranslationUnit(TranslationUnit);
    }
}

We obtained the following output:

31 Array
27 T
2 S1
6 S1i
2 S2
6 S2i
6 S2Grid
2 S3
6 S3i
6 S3Grid
45 Array
45 Array

We observed that Clang fails to parse S1Grid as a field. Judging by the fact that S2Grid is properly parsed, we suspected >> in S1Gird's type is parsed as a right shift. Interestingly, S3Grid is also properly parsed. Probably because char is a built-in type and Vec3 is not? What can we do to make lib clang parse nested template correctly without manually adding a space in the source?

Clang version returned by clang_getClangVersion is clang version 9.0.0 (tags/RELEASE_900/final)


Solution

  • The primary issue is error recovery

    The input file, Example.h, contains syntax errors. First, there is an #include of Vec3.h that is not found. If that is fixed (by just commenting it out) then Clang reports the additional errors, specifically that Vec3 is an undeclared identifier, which is what is causing the problem.

    Because there are syntax errors, Clang attempts to provide a "best effort" parse of what the code is supposed to mean, but it necessarily must use some heuristics to deal with the errors, and those heuristics are imperfect. When using Clang 9, and this particular input, the recovery heuristics evidently fail to recognize the >> as the closing delimiter of nested template-ids, and consequently S1Grid is not recorded as a field in the resulting AST.

    Solution #1: Fix the input code

    Although Clang can be used to parse code containing syntax errors, and is regularly used in that way for IDE support, that isn't what it was originally designed to do, and in any case the heuristics are always going to be imperfect. So the simplest solution is to ensure that the code is free of syntax errors before parsing it.

    If that is how you want to proceed, then you'll want to adjust the way you invoke Clang to check for and report syntax errors. See the question Is there a way to get a meaningfull error message when compiling code through libclang? for details on how to do that.

    Solution #2: Upgrade to a later version of Clang

    For this example input, Clang 9 does not report the S1Grid field, but Clang 11, 14, and 16 (all of the others I tested) do. Evidently the error recovery heuristics improved for this case. Again, there's no guarantee those versions will do the right thing on every input, but for this one, they do.