Search code examples
c++glibc

Does glibc wcslen() expect data alignment along wchar_t sized boundaries?


Been wracking my brain for hours on this one. glibc wcslen() is returning a different value than expected for a given input string. I've narrowed the problem to a possible data alignment issue but even that doesn't make sense to me. My own function that dumps the string appears to work fine and also calculates the size similar to how wcslen() supposedly works.

    #ifndef WCHAR
        #define WCHAR wchar_t
    #endif

...

void DumpWCStr(const WCHAR *str)
{
    size_t len = 0, len2 = wcslen(str);

    while (*str != L'\0')
    {
        printf("%lu %lc\n", (size_t)*str, *str);

        str++;
        len++;
    }

    printf("Size:  %lu (wcslen:  %lu)\n", len, len2);
}

void TestFunc()
{
    char *prebuffer = (char *)malloc(100 * sizeof(WCHAR) + 1);
    WCHAR *tempbuffer = (WCHAR *)(prebuffer + 1);
    WCHAR tempbuffer2[100];

    memset(prebuffer, 0xFF, 100 * sizeof(WCHAR) + 1);
    swprintf(tempbuffer, 100, L"%ls (%d)", L"test", 15);
DumpWCStr(tempbuffer);

    memset(prebuffer, 0xFF, 100 * sizeof(WCHAR) + 1);
    tempbuffer = (WCHAR *)prebuffer;
    swprintf(tempbuffer, 100, L"%ls (%d)", L"test", 15);
DumpWCStr(tempbuffer);

    memset(prebuffer, 0xFF, 100 * sizeof(WCHAR) + 1);
    swprintf(tempbuffer2, 100, L"%ls (%d)", L"test", 15);
DumpWCStr(tempbuffer2);
}

Outputs:

116 t
101 e
115 s
116 t
32
40 (
49 1
53 5
41 )
Size:  9 (wcslen:  8)
116 t
101 e
115 s
116 t
32
40 (
49 1
53 5
41 )
Size:  9 (wcslen:  9)
116 t
101 e
115 s
116 t
32
40 (
49 1
53 5
41 )
Size:  9 (wcslen:  9)

glibc wcslen() implementation from here shows wcslen() implemented as:

size_t
 __wcslen (const wchar_t *s)
 {
   size_t len = 0;
 
   while (s[len] != L'\0')
     {
       if (s[++len] == L'\0')
         return len;
       if (s[++len] == L'\0')
         return len;
       if (s[++len] == L'\0')
         return len;
       ++len;
     }
 
     return len;
  }

Attempting to printf("%ls\n", tempbuffer); after the first swprintf() results in:

wcsrtombs.c:94: __wcsrtombs: Assertion `data.__outbuf[-1] == '\0'' failed.

Which probably happens because __wcslen() is returning 8 instead of 9 inside __wcsrtombs().

I'm compiling the code as C++ and the target is Intel x86/x64.

So is glibc wcslen() expecting data alignment on wchar_t sized boundaries? That's not how I read the source code for wcslen() but it certainly is acting like it expects data alignment.


Solution

  • In general, C++ always requires that all access to an object is aligned to at least as strict boundary as the type requires. This is because objects cannot exist in misaligned addresses. So, this requirement is not specific to wcslen.

    On your system, alignof(wchar_t) is likely greater than 1, in which case prebuffer + 1 is always misaligned and therefore cannot contain a wchar_t object.

    Violating this requirement results in undefined behaviour.