I'm building an application in web.py, but am having some trouble when it comes to storing encrypted data in MySQL.
PyCrypto creates cypher text that looks like:'\x06\x7f\x81\xa0\xf4d\x00H\xef\xd0\x18[c\x18Z\xf8
', which when printed, shows up as " ôdHïÐ[cZø
"
However, MySQL is storing it as: ???d H??[cZ?
I'm storing it in the following manner:
query_string = "INSERT INTO %s (%s) VALUES (\"%s\")" % (table, column, value)
I tried using "SET character_set_connection=utf8
" after connecting to the database, but it didn't yield any change in the results.
I'm obviously missing something very important. Any thoughts?
MySQL is trying to store your byte string in a character column. Because the connection character set is UTF-8 but the byte string doesn't represent a valid UTF-8 sequence, it gets mangled.
To get raw bytes into the database properly you need to:
make the column a BINARY type (or generally character type with a binary collation), and
use parameterised queries to get the data into the database instead of interpolating them into the query string where they might mix with non-binary (Unicode) content.
You should use parameterised queries anyway because the string interpolation you're using now, with no escaping, is vulnerable to SQL injection. In web.py that might look like:
query_string= 'INSERT INTO %s (%s) VALUES ($value)' % (table, column)
db.query(query_string, vars= {'value': value})
(assuming that the table
and column
values are known-good.)
Doing it like this also means you don't have to worry about the dollar sign.
The other approach is to use a normal character string, encoding away the non-ASCII bytes. You're doing this with uucode in your current workaround, but base64 would be a more common alternative that's easier to get to in Python (ciphertext.encode('base64')
). Hex encoding (.encode('hex')
) is most common for the case of a hash.