Search code examples
pythonmysqlpymysql

pymysql encoding error while inserting binary data into longblob column


I'm trying to insert contents of a binary file into a longblob column:

Python code:

conn = pymysql.connect(...)
cursor = conn.cursor()
with open('test.bz2', 'rb') as fp:
    data = fp.read()
cursor.execute('insert into test_t (test) values (%s)', [data])

Error stack trace:

Traceback (most recent call last):
  File "./doit2", line 9, in <module>
    cursor.execute('insert into test_t (test) values (%s)', [data])
  File "/u02/srm_tp/local/lib/python3.4/site-packages/pymysql/cursors.py", line 127, in execute
    result = self._query(query)
  File "/u02/srm_tp/local/lib/python3.4/site-packages/pymysql/cursors.py",     line 275, in _query
    conn.query(q)
  File "/u02/srm_tp/local/lib/python3.4/site-packages/pymysql/connections.py", line 763, in query
    sql = sql.encode(self.encoding)
UnicodeEncodeError: 'latin-1' codec can't encode character '\udcae' in position 45: ordinal not in range(256)

Create table script:

mysql> show create table test_t;
+--------+--------------------------------------------------------------------------+
| Table  | Create Table                                                             |
+--------+--------------------------------------------------------------------------+
| test_t | CREATE TABLE `test_t` (
  `test` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+--------+--------------------------------------------------------------------------+

Default Encoding:

=->python3 -c 'import sys; print(sys.getdefaultencoding())'
utf-8

Adding "charset='utf8', use_unicode=True" to connect call, changes the error to:

Traceback (most recent call last):
  File "./doit2", line 13, in <module>
    cursor.execute('insert into test_t (test) values (%s)', [data])
  File "/u02/srm_tp/local/lib/python3.4/site-packages/pymysql/cursors.py", line 127, in execute
    result = self._query(query)
  File "/u02/srm_tp/local/lib/python3.4/site-packages/pymysql/cursors.py", line 275, in _query
    conn.query(q)
  File "/u02/srm_tp/local/lib/python3.4/site-packages/pymysql/connections.py", line 763, in query
    sql = sql.encode(self.encoding)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcae' in position 45: surrogates not allowed

Solution

  • Looks like it was a pymysql bug. I upgraded from 0.6.4 to 0.6.6 (latest as of now) and the issue is no longer there.