On my server, a Python script gets data from a database as a tuple. Then the script converts the tuple to a string (using json.dumps()) to be passed to the JavaScript script in the user's browser.
The data include German names such as Weidmüller. When the Python scrip gets that data, it returns it as Weidm\xfcller, where \xfc is the UTF-8 encoding of ü. So far so good.
However,
json.dumps(tableData,ensure_ascii=False)
converts the \xfc to �json.dumps(tableData,ensure_ascii=True)
fails: "UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 5: invalid start byte"What I really want is for json.dumps to leave the UTF-8 encoded character alone; to just pass the \xfc as is. That way the JavaScript script in the user's browser can do the decoding. Is that possible?
Or, am I approaching the problem incorrectly?
Here is the complete code:
import MySQLdb
...
# Open the data base and return a handle to it and its cursor
dataBase, dbCursor = database.OpenDB()
# Get data from the URL
fieldStore = cgi.FieldStorage()
selFieldName = selFieldValue = ''
sqlQuery = 'SELECT * FROM %s' % (database.CompTableName)
if ('fldName' in fieldStore) and ('fldValue' in fieldStore):
fldName = fieldStore['fldName'].value
fldValue = fieldStore['fldValue'].value
sqlQuery += ' WHERE %s = \'%s\'' % (fldName,fldValue)
if ('max' in fieldStore):
maxRows = fieldStore['max'].value
sqlQuery += ' LIMIT ' + maxRows
# Get the selected data in the table as a list of lists
rowsAffected = dbCursor.execute(sqlQuery)
tableData = dbCursor.fetchall()
# Close the database and return the results
dataBase.close()
jsonTableData = json.dumps(tableData,encoding='latin1',ensure_ascii=True)
print jsonTableData
And here is test code:
tableData = (('item1', 'Jones',), ('item2', 'Weidm\xfcller'))
jsonTableData = json.dumps(tableData,encoding='latin1',ensure_ascii=True)
print jsonTableData
\xfc
is not the UTF-8 encoding of ü, it's the latin-1 encoding.
>>> u'ü'.encode('latin-1')
'\xfc'
>>> u'ü'.encode('utf-8')
'\xc3\xbc'
If you json.dumps
text, you shouldn't get replacement characters like that:
>>> json.dumps({u"k": u"Weidmüller"})
'{"k": "Weidm\\u00fcller"}'
>>> json.dumps({u"k": u"Weidmüller"}, ensure_ascii=False)
u'{"k": "Weidm\xfcller"}'
Check to make sure that what you're getting from the database is correctly decoded text in the first place.