Search code examples
c++cwinapiunicode-stringwchar

UNICODE_STRING to wchar_t* null terminated


I want to use the buffer from a UNICODE_STRING, but it seems I cannot just directly use it, by copying reference, because sometime I can see that there are null bytes in the middle of a string, and Length is greater than what I see in the debugger. So if I do this

UNICODE_STRING testStr;
//after being used by some function it has data like this 'bad丣\0more_stuff\0'

wchar_t * wStr = testStr.Buffer;

I will end up with wStr = "bad丣"; Is there a way to convert this to the null terminated, valid wchar_t*?


Solution

  • A wchar_t* is just a pointer. Unless you tell the debugger (or any function you pass the wchar_t* to) exactly how many wchar_t characters are actually being pointed at, it has to stop somewhere, so it stops on the first null character it encounters.

    UNICODE_STRING::Buffer is not guaranteed to be null-terminated, but it can contain embedded nulls. You have to use the UNICODE_STRING::Length field to know how many WCHAR elements are in the Buffer, including embedded nulls but not counting a trailing null terminator if one is present. If you need a null terminator, copy the Buffer data to your own buffer and append a terminator.

    The easiest way to do that is to use std::wstring, eg:

    #include <string>
    
    UNICODE_STRING testStr;
    // fill testStr as needed...
    
    std::wstring wStrBuf(testStr.Buffer, testStr.Length / sizeof(WCHAR));
    const wchar_t *wStr = wStrBuf.c_str();
    

    The embedded nulls will still be present, but c_str() will append the trailing null terminator for you. The debugger will still display the data up to the first null only, unless you tell the debugger the actual number of WCHAR elements are in the data.

    Alternatively, if you know the Buffer data contains multiple substrings separated by nulls, you could optionally split the Buffer data into an array of strings instead, eg:

    #include <string>
    #include <vector>
    
    UNICODE_STRING testStr;
    // fill testStr as needed...
    
    std::vector<std::wstring> wStrArr;
    
    std::wstring wStr(testStr.Buffer, testStr.Length / sizeof(WCHAR));
    std::wstring::size_type startidx = 0;
    do
    {
        std::wstring::size_type idx = wStr.find(L'\0', startidx);
        if (idx == std::wstring::npos)
        {
            if (startidx < wStr.size())
            {
                if (startidx > 0)
                    wStrArr.push_back(wStr.substr(startidx));
                else
                    wStrArr.push_back(wStr);
            }
            break;
        }
        wStrArr.push_back(wStr.substr(startidx, idx-startidx));
        startidx = idx + 1;
    }
    while (true);
    
    // use wStrArr as needed...
    

    Or:

    #include <vector>
    #include <algorithm>
    
    UNICODE_STRING testStr;
    // fill testStr as needed...
    
    std::vector<std::wstring> wStrArr;
    
    WCHAR *pStart = testStr.Buffer;
    WCHAR *pEnd = pStart + (testStr.Length / sizeof(WCHAR));
    
    do
    {
        WCHAR *pFound = std::find(pStart, pEnd, L'\0');
        if (pFound == pEnd)
        {
            if (pStart < pEnd)
                wStrArr.push_back(std::wstring(pStart, pEnd-pStart));
            break;
        }
        wStrArr.push_back(std::wstring(pStart, pFound-pStart));
        pStart = pFound + 1;
    }
    while (true);
    
    // use wStrArr as needed...