Search code examples
python-2.7unicodewinreg

python 27, _winreg() not fully supports read/write Unicode string to registry


I did some tests as the image attached. test result image

I could create a new folder named "新資料夾" on the OS. and new a new REG_SZ, the value is "新資料夾" in registry editor manually.

But it writes a question mark string ????? by python 27, _winreg() that I pass a Chinese unicode string"新資料夾".

Why ? How can I set a correct Chinese string in registry on a EN OS ?

My code :

from _winreg import *
import codecs
import os
import chardet

class Reg:
    #-------------------------------------------------------------------------
    # HKEY_LOCAL_MACHINE
    #-------------------------------------------------------------------------
    def HKLM_SetReg(self, RegDir, KeyDir, KeyName, KeyValue):
        try:
            if RegDir == "x64" :
                key = OpenKey(HKEY_LOCAL_MACHINE, KeyDir, 0, KEY_ALL_ACCESS | KEY_WOW64_64KEY)
            else :
                key = OpenKey(HKEY_LOCAL_MACHINE, KeyDir, 0, KEY_ALL_ACCESS | KEY_WOW64_32KEY)
        except:
            if RegDir == "x64" :
                key = CreateKeyEx(HKEY_LOCAL_MACHINE, KeyDir, 0, KEY_ALL_ACCESS | KEY_WOW64_64KEY)
            else: 
                key = CreateKeyEx(HKEY_LOCAL_MACHINE, KeyDir, 0, KEY_ALL_ACCESS | KEY_WOW64_32KEY)

        SetValueEx(key, KeyName, 0, REG_SZ, KeyValue)
        CloseKey(key)        

p = Reg()

s = '初めまして' # Japanese
print 'string chardet = ', chardet.detect(s)
print 'repr = ', repr(s)
print type(s)
p.HKLM_SetReg("x86", ur"SOFTWARE\test", ur"input_string", s)

s = '初めまして' # Japanese
s_decode = s.decode('utf-8') # Japanese
print 'repr = ', repr(s_decode)
print type(s_decode)
p.HKLM_SetReg("x86", ur"SOFTWARE\test", ur"input_string_decode", s_decode)

s = u"新資料夾" # zh-tw
#print chardet.detect(s)
print 'repr = ', repr(s)
print type(s)
p.HKLM_SetReg("x86", ur"SOFTWARE\test", ur"input_unicode", s)


s = u"新資料夾" # zh-tw
s_encode = s.encode('utf-8')
#print chardet.detect(s)
print 'repr = ', repr(s_encode)
print type(s_encode)
p.HKLM_SetReg("x86", ur"SOFTWARE\test", ur"input_unicode_encode", s_encode)

Thank you for any help

Morris


Solution

  • In Python 2.7 _winreg does not correctly handle Unicode. More specifically, I think it will handle unicode if the characters can be encoded into your codepage, and passed through the narrow windows API. But not if it can't.

    You could use the winreg_unicode package instead.