Search code examples
pythonpython-3.xwindowsctypespywin32

Passing utf-16 string to a Windows function


I have a Windows dll called some.dll with the following function:

void some_func(TCHAR* input_string)
{
...
}

some_func expects a pointer to utf-16 encoded string.

Running this python code:

from ctypes import *

some_string = "disco duck"
param_to_some_func = c_wchar_p(some_string.encode('utf-16'))  #  here exception!

some_dll = ctypes.WinDLL(some.dll)
some_dll.some_func(param_to_some_func)

fails with exception "unicode string or integer address expected instead of bytes instance"

The documentation for ctypes and ctypes.wintypes is very thin, and I have not found a way to convert a python string to a Windows wide char and pass it to a function.


Solution

  • According to [Python 3.Docs]: Built-in Types - Text Sequence Type - str (emphasis is mine):

    Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points.

    On Win they are UTF16 encoded.

    So, the correspondence between CTypes and Python (also visible by checking the differences between):

    ╔═══════════════╦══════════════╦══════════════╗
    ║    CTypes     ║   Python 3   ║   Python 2   ║
    ╠═══════════════╬══════════════╬══════════════╣
    ║   c_char_p    ║    bytes     ║     str      ║
    ║   c_wchar_p   ║     str      ║   unicode    ║
    ╚═══════════════╩══════════════╩══════════════╝
    

    Example:

    • Python 3:

      >>> import ctypes as cts
      >>> import sys
      >>>
      >>> sys.version
      '3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]'
      >>>
      >>> text_ascii = b"Dummy"
      >>> text_unicode = "Dummy"
      >>>
      >>> cts.c_char_p(text_ascii)
      c_char_p(2563882450144)
      >>>
      >>> cts.c_wchar_p(text_ascii)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: unicode string or integer address expected instead of bytes instance
      >>>
      >>> cts.c_char_p(text_unicode)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: bytes or integer address expected instead of str instance
      >>>
      >>> cts.c_wchar_p(text_unicode)
      c_wchar_p(2563878400656)
      
    • Python 2 (note that str <=> unicode conversions are performed automatically):

      >>> import ctypes as cts
      >>> import sys
      >>>
      >>> sys.version
      '2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]'
      >>>
      >>> text_ascii = "Dummy"
      >>> text_unicode = u"Dummy"
      >>>
      >>> cts.c_char_p(text_ascii)
      c_char_p('Dummy')
      >>>
      >>> cts.c_wchar_p(text_ascii)
      c_wchar_p(u'Dummy')
      >>>
      >>> cts.c_char_p(text_unicode)
      c_char_p('Dummy')
      >>>
      >>> cts.c_wchar_p(text_unicode)
      c_wchar_p(u'Dummy')
      

    Back to your situation:

    >>> import ctypes as cts
    >>>
    >>> some_string = "disco duck"
    >>>
    >>> enc_utf16 = some_string.encode("utf16")
    >>> enc_utf16
    b'\xff\xfed\x00i\x00s\x00c\x00o\x00 \x00d\x00u\x00c\x00k\x00'
    >>>
    >>> type(some_string), type(enc_utf16)
    (<class 'str'>, <class 'bytes'>)
    >>>
    >>> cts.c_wchar_p(some_string)  # This is the right way
    c_wchar_p(2508534214928)
    >>>
    >>> cts.c_wchar_p(enc_utf16)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unicode string or integer address expected instead of bytes instance
    

    As a side note, TCHAR varies (it's a typedef) on _UNICODE (not) being defined. Check [MS.Learn]: Generic-Text Mappings in tchar.h for more details. So, depending on the C code compilation flags, the Python code might also need adjustments.

    You could also check: