Search code examples
pythonpython-3.xunicodepyobjc

Python3 unicode strings with pyobjc


I'm converting a lot of python2 scripts that use pyobjc to python3, and having trouble getting them to work. The problem seems to relate to the Unicode changes in python3.

The following call to a pyobjc method works in python2:

import Quartz as Quartz
filename = '/path/to/myfile.pdf'
provider = Quartz.CGDataProviderCreateWithFilename(filename)

but in python 3, I get ValueError: depythonifying 'char', got 'str' of 1

This can be fixed by encoding the string first:

filenameNonU = filename.encode('utf-8')
provider = Quartz.CGDataProviderCreateWithFilename(filenameNonU)

... and the script works, unless the string includes 'non-ASCII' characters (e.g. Ä∂∫ß), in which case, I get: ValueError: depythonifying 'char', got 'int' of wrong magnitude

Using the codec 'raw-unicode-escape' works for ASCII range; and does not flag an error for strings with Unicode chars, but just returns None from the method, so it seems like it's just a question of getting the right codec.

So, my question is: what do I need to do to get my strings in the same format as python2 was using, so that the pyobjc method will deal with them correctly?

python2 returns something like:

A\xcc\x88\xc6\x92\xe2\x88\x82

for Unicode characters higher than 128; and I get the same result in python3 when encoded utf-8, except for the b prefix.

raw_unicode_escape gives something like A\\u0308\\u0192\\u2202, which is a different format.

It's no coincidence that the methods with this problem use pointers as their arguments in ObjC. But one of the benefits of python is that it (up to now) handles things like types and pointers invisibly.


Solution

  • I've got in touch with Ronald Oussoren, the maintainer of pyObjC, and he's confirmed there's a bug causing the problem with characters above 255.

    This has now been fixed in pyobjc 8.5.

    For the avoidance of doubt, the correct encoding for strings passed as arguments should be utf8.