Search code examples
windowsdebuggingwinapiwindbgnatvis

How to pull .natvis data out of a PDB?


The /NATVIS linker option can be used to embed debug visualizers into a PDB.

Given a PDB, is there a way to recover all embedded debug visualizers? I'm looking for a first-party tool (like DUMPBIN), and if that cannot do, a solution based on a first-party API (like DIA).


Solution

  • As noted in a comment, the ability to store .natvis files in a PDB is implemented by reusing the infrastructure for embedding arbitrary source files in a PDB.

    The exercise thus comes down to parsing the respective tables in a PDB and filtering relevant entries. Thankfully, the parsing is already done for us by the Debug Interface Access SDK (DIA SDK) that ships with Visual Studio1. What's left is navigating the reference documentation to discover applicable building blocks.

    Strategy

    The following steps solve the problem statement:

    1. Set up the build environment
    2. Construct an IDiaDataSource interface
    3. Ingest a PDB into the data source and initiate an IDiaSession
    4. Find an IDiaEnumInjectedSources table
    5. Visit each IDiaInjectedSource row and extract relevant data
    6. [optional] Pull it all together

    Build Environment

    The DIA SDK ships with Visual Studio. Technically, that makes it a 3rd-party library, and the natural ordeal of setting things up is due. I covered the prerequisite steps here:

    With the proposed changes applied, the following program should successfully compile and link:

    #include "dia2.h"
    #pragma comment(lib, "diaguids")
    
    int main() { auto const clsid { CLSID_DiaSource }; }
    

    Construct IDiaDataSource

    This should be as simple as following the official example. However, it is not. The following program fails with a REGDB_E_CLASSNOTREG error code:

    #include "dia2.h"
    #pragma comment(lib, "diaguids")
    
    #include <objbase.h>
    
    int main()
    {
        ::CoInitialize(nullptr);
    
        IDiaDataSource* pSource;
        HRESULT hr = ::CoCreateInstance(CLSID_DiaSource,
                                        nullptr,
                                        CLSCTX_INPROC_SERVER,
                                        IID_IDiaDataSource,
                                        (void**)&pSource);
        if (FAILED(hr))
            throw hr;
    }
    

    There isn't anything inherently wrong with this code. It follows the standard pattern for in-proc COM server activation. The issue is that the COM server isn't registered (on my machine, anyways2). The documentation lists "msdia80.dll" (VS 2005), and things apparently changed between then and "msdia140.dll" (VS 2015+), and what was right once is wrong now.

    I didn't spend a whole bunch of time trying to register the COM server or investigating the use of side-by-side assembly manifests, or fooling about with Activation Contexts to trick the COM infrastructure into discovering "msdia140.dll".

    Either of the above may well be more correct, though I settled for using an undocumented export of "diaguids.lib" instead:

    HRESULT NoRegCoCreate(const wchar_t *dllName,
                          REFCLSID   rclsid,
                          REFIID     riid,
                          void     **ppv);
    

    This looks like a (homebrew) version of registration-free COM, which is good enough for now. The following program successfully executes:

    #include "dia2.h"
    #include "diacreate.h"
    #pragma comment(lib, "diaguids")
    
    #include <objbase.h>
    
    int main()
    {
        ::CoInitialize(nullptr);
    
        IDiaDataSource* pSource;
        HRESULT hr = ::NoRegCoCreate(L"msdia140.dll",
                                     CLSID_DiaSource,
                                     IID_IDiaDataSource,
                                     (void**)&pSource);
        if (FAILED(hr))
            throw hr;
    }
    

    This loads "msdia140.dll" from "C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\msdia140.dll" on my system, so there may be additional dependencies which I didn't investigate.

    Initiate IDiaSession

    The IDiaSession is at the center of the DIA SDK. It is the pivot point for all queries against the symbol store managed by the IDiaDataSource. Initiating a session is as simple as calling IDiaDataSource::openSession():

    /// @brief Initiates a DIA session for queries against a PDB.
    ///
    /// @param pdb_file Fully qualified pathname to the PDB file.
    ///
    /// @return Returns an `IDiaSession` smart pointer on success. Errors are
    ///         reported via C++ exceptions.
    ///
    [[nodiscard]] wil::com_ptr<IDiaSession>
    session_from_pdb(fs::path const& pdb_file)
    {
        wil::com_ptr<IDiaDataSource> source {};
        THROW_IF_FAILED(::NoRegCoCreate(L"msdia140.dll", CLSID_DiaSource,
                                        IID_PPV_ARGS(&source)));
    
        THROW_IF_FAILED(source->loadDataFromPdb(pdb_file.c_str()));
    
        wil::com_ptr<IDiaSession> session {};
        THROW_IF_FAILED(source->openSession(&session));
    
        return session;
    }
    

    The code is using the Windows Implementation Libraries (WIL) for convenient resource management and error handling. A C++17 compiler is required due to the use of the filesystem library.

    Find the source table

    "Streams" in the PDB file format are represented as "tables" in the DIA SDK. IDiaSession::getEnumTables() returns an iterator over all tables, where IDiaEnumTables::Next() returns a generic IDiaTable interface for each entry. A call to QueryInterface() allows us to discover the specific table type. We are interested in the IDiaEnumInjectedSource table specifically so that's what the code is requesting. Since there can be at most one such table3, we can return early once identified:

    /// @brief Attempts to find the "injected sources" table.
    ///
    /// @param session The `IDiaSession` to use for the query. The caller must
    ///                ensure that the pointer is valid for the duration of the
    ///                call. Ownership remains with the caller.
    ///
    /// @return Returns an `IDiaEnumInjectedSources` interface if found, a
    ///         `std::nullopt` otherwise. Errors are reported via C++ exceptions.
    ///
    [[nodiscard]] std::optional<wil::com_ptr<IDiaEnumInjectedSources>>
    get_source_iterator(IDiaSession* session)
    {
        THROW_HR_IF_NULL(E_POINTER, session);
    
        wil::com_ptr<IDiaEnumTables> tables {};
        THROW_IF_FAILED(session->getEnumTables(&tables));
    
        ULONG celt {};
        wil::com_ptr<IDiaTable> table {};
        while (tables->Next(1, &table, &celt) == S_OK && celt == 1)
        {
            // Check whether the table implements the `IDiaEnumInjectedSources`
            // interface
            auto const pdb_table = table.try_query<IDiaEnumInjectedSources>();
            if (pdb_table)
            {
                // Found the table, so let's stop looking and return it
                return pdb_table;
            }
        }
    
        return {};
    }
    

    Extract source data

    With an IDiaEnumInjectedSources iterator, we can reuse the pattern above to discover an IDiaInjectedSource interface for each entry and extract the relevant information (file name and source code bytes):

    /// @brief Stores source file name and corresponding data.
    ///
    struct Source
    {
        fs::path name;
        std::vector<unsigned char> data;
    };
    
    /// @brief Extracts source data for each entry in the "injected sources" table.
    ///
    /// @param source_it The iterator over the "injected sources" table. The caller
    ///                  must ensure that the pointer is valid for the duration of
    ///                  the call. Ownership remains with the caller.
    ///
    /// @return Returns a list of `Source` objects on success. Errors are reported
    ///         via C++ exceptions.
    ///
    [[nodiscard]] std::list<Source> get_sources(IDiaEnumInjectedSources* source_it)
    {
        THROW_HR_IF_NULL(E_POINTER, source_it);
    
        std::list<Source> src_list {};
    
        ULONG celt {};
        wil::com_ptr<IDiaInjectedSource> source {};
        while (source_it->Next(1, &source, &celt) == S_OK && celt == 1)
        {
            ULONGLONG length {};
            wil::unique_bstr filename {};
            if (source->get_length(&length) == S_OK
                && source->get_filename(&filename) == S_OK)
            {
                std::vector<unsigned char> data(length, {});
                DWORD bytes_written {};
                if (source->get_source(static_cast<DWORD>(data.size()),
                                       &bytes_written, data.data())
                        == S_OK
                    && bytes_written == data.size())
                {
                    src_list.emplace_back(
                        Source { filename.get(), std::move(data) });
                }
            }
        }
    
        return src_list;
    }
    

    This is rather straightforward. However, there are a few points worth mentioning:

    Injected source files can be compressed. IDiaInjectedSource::get_sourceCompression() returns a loosely specified value, where 0 means "no compression". Other values are possible but their meaning is specific to the tool responsible for generating the PDB. More work is required if you plan to interpret the source data.

    The IDiaInjectedSource interface also doesn't offer a way to identify the type of source it refers to. I dumped the IDiaPropertyStorage key/value pairs as well to make sure I wasn't overlooking something, but that didn't turn up anything useful either (at least for my test input). The file name extension thus serves as the only hint.

    Full program

    With everything covered, it would be a waste not to pull it all together into a program. The following compiles to a command line utility that takes a PDB file and an output directory as parameters and dumps all .natvis files found in the PDB:

    #include <Windows.h>
    
    #include <combaseapi.h>
    
    #include "dia2.h"
    #include "diacreate.h"
    #pragma comment(lib, "diaguids.lib")
    
    #include <wil/com.h>
    #include <wil/resource.h>
    #include <wil/result.h>
    
    #include <cstdlib>
    #include <cwchar>
    #include <filesystem>
    #include <fstream>
    #include <list>
    #include <optional>
    #include <utility>
    #include <vector>
    
    namespace fs = std::filesystem;
    
    
    /// @brief Initiates a DIA session for queries against a PDB.
    ///
    /// @param pdb_file Fully qualified pathname to the PDB file.
    ///
    /// @return Returns an `IDiaSession` smart pointer on success. Errors are
    ///         reported via C++ exceptions.
    ///
    [[nodiscard]] wil::com_ptr<IDiaSession>
    session_from_pdb(fs::path const& pdb_file)
    {
        wil::com_ptr<IDiaDataSource> source {};
        THROW_IF_FAILED(::NoRegCoCreate(L"msdia140.dll", CLSID_DiaSource,
                                        IID_PPV_ARGS(&source)));
    
        THROW_IF_FAILED(source->loadDataFromPdb(pdb_file.c_str()));
    
        wil::com_ptr<IDiaSession> session {};
        THROW_IF_FAILED(source->openSession(&session));
    
        return session;
    }
    
    /// @brief Attempts to find the "injected sources" table.
    ///
    /// @param session The `IDiaSession` to use for the query. The caller must
    ///                ensure that the pointer is valid for the duration of the
    ///                call. Ownership remains with the caller.
    ///
    /// @return Returns an `IDiaEnumInjectedSources` interface if found, a
    ///         `std::nullopt` otherwise. Errors are reported via C++ exceptions.
    ///
    [[nodiscard]] std::optional<wil::com_ptr<IDiaEnumInjectedSources>>
    get_source_iterator(IDiaSession* session)
    {
        THROW_HR_IF_NULL(E_POINTER, session);
    
        wil::com_ptr<IDiaEnumTables> tables {};
        THROW_IF_FAILED(session->getEnumTables(&tables));
    
        ULONG celt {};
        wil::com_ptr<IDiaTable> table {};
        while (tables->Next(1, &table, &celt) == S_OK && celt == 1)
        {
            // Check whether the table implements the `IDiaEnumInjectedSources`
            // interface
            auto const pdb_table = table.try_query<IDiaEnumInjectedSources>();
            if (pdb_table)
            {
                // Found the table, so let's stop looking and return it
                return pdb_table;
            }
        }
    
        return {};
    }
    
    /// @brief Stores source file name and corresponding data.
    ///
    struct Source
    {
        fs::path name;
        std::vector<unsigned char> data;
    };
    
    /// @brief Extracts source data for each entry in the "injected sources" table.
    ///
    /// @param source_it The iterator over the "injected sources" table. The caller
    ///                  must ensure that the pointer is valid for the duration of
    ///                  the call. Ownership remains with the caller.
    ///
    /// @return Returns a list of `Source` objects on success. Errors are reported
    ///         via C++ exceptions.
    ///
    [[nodiscard]] std::list<Source> get_sources(IDiaEnumInjectedSources* source_it)
    {
        THROW_HR_IF_NULL(E_POINTER, source_it);
    
        std::list<Source> src_list {};
    
        ULONG celt {};
        wil::com_ptr<IDiaInjectedSource> source {};
        while (source_it->Next(1, &source, &celt) == S_OK && celt == 1)
        {
            ULONGLONG length {};
            wil::unique_bstr filename {};
            if (source->get_length(&length) == S_OK
                && source->get_filename(&filename) == S_OK)
            {
                std::vector<unsigned char> data(length, {});
                DWORD bytes_written {};
                if (source->get_source(static_cast<DWORD>(data.size()),
                                       &bytes_written, data.data())
                        == S_OK
                    && bytes_written == data.size())
                {
                    src_list.emplace_back(
                        Source { filename.get(), std::move(data) });
                }
            }
        }
    
        return src_list;
    }
    
    int wmain(int argc, wchar_t* argv[])
    {
        if (argc != 3 || fs::path { argv[2] }.has_filename())
        {
            ::wprintf(L"Usage: DumpNatvis <pdb file> <out dir>\n");
            return EXIT_FAILURE;
        }
    
        // Make sure the output directory exists
        fs::path const out_dir { argv[2] };
        fs::create_directories(out_dir);
    
        THROW_IF_FAILED(::CoInitialize(nullptr));
    
        auto const session = session_from_pdb(argv[1]);
        auto const source_it = get_source_iterator(session.get());
        if (source_it)
        {
            auto const src_list = get_sources(source_it.value().get());
            for (auto const& src : src_list)
            {
                // Filter .natvis data
                if (src.name.extension() == L".natvis")
                {
                    auto const path_name = out_dir / src.name.filename();
                    auto file = std::ofstream(
                        path_name, std::ofstream::out | std::ofstream::trunc
                                       | std::ofstream::binary);
                    file.write(reinterpret_cast<char const*>(src.data.data()),
                               src.data.size());
                }
            }
        }
    }
    

    1 It is included with the Desktop development with C++ workload in the Visual Studio Installer.

    2 And someone else's machine, too.

    3 Based on a comment in the official example code. Hopefully this statement is (still) true.