Search code examples
pythonmysqlunicodepymysql

Unicode values in pymysql insert query


I am trying to insert Unicode values using pymysql and getting a syntax error in the INSERT query. But it works fine when I encode the values with utf8, which I don't want.

This is my code:

from pymysql.cursors import *
import pymysql
from collections import OrderedDict
import datetime

class OrderedDictCursor(DictCursorMixin, Cursor):
    dict_type = OrderedDict

conn1 = pymysql.connect(host='127.0.0.1',
                       port=3306,
                       user='root',
                       passwd='pwd',
                       db='test',
                       charset='utf8',
                       use_unicode=True,
                       autocommit=True)

cursor1 = conn1.cursor(OrderedDictCursor)
odict = OrderedDict([(u'id', 374), (u'title', u'Chapter 4'), (u'intro_list', u'Objective:\r\n\r\n* Exit any mininet launch done earlier using \u201cmn \u2013c\u201d\r\n'), (u'solution', u'%%beginpanel%%\r\n\r\n## 1. net\r\n\r\n```\r\nmn -c\r\n```\r\n\r\n \u201cCTRL+C\u201d \r\n\r\n%%endpanel%%\r\n'), (u'created', datetime.datetime(2017, 3, 9, 7, 58, 7)), (u'modified', datetime.datetime(2017, 8, 28, 4, 58, 15))])
cols = odict.keys()
vals = odict.values()
cursor1.execute("INSERT INTO %s (%s) VALUES (%s)" % ("test1", ",".join(cols), (str(vals)[1:-1])))

This throws a syntax error as,

pymysql.err.ProgrammingError: (1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''Chapter 4', u'Objective:\r\n\r\n* Exit any mininet launch done earlier using \u' at line 1")

Insert is successful when I encode the values with utf8, but that will also encode the values like, \u201cmn \u2013c\u201d and my app can't decode it back.

I need a solution to insert my unicode values as is into my MySQL database.

Anyone, please help.


Solution

  • "INSERT INTO %s (%s) VALUES (%s)" % ("test1", ",".join(cols), (str(vals)[1:-1]))
    

    Here is the query string that results from your data above:

    INSERT INTO test1 (id,title,intro_list,solution,created,modified)
    VALUES (
        374,
        u'Chapter 4',
        u'Objective:\\r\\n\\r\\n* Exit any mininet launch done earlier using \\u201cmn \\u2013c\\u201d\\r\\n',
        u'%%beginpanel%%\\r\\n\\r\\n## 1. net\\r\\n\\r\\n```\\r\\nmn -c\\r\\n```\\r\\n\\r\\n \\u201cCTRL+C\\u201d \\r\\n\\r\\n%%endpanel%%\\r\\n',
        datetime.datetime(2017, 3, 9, 7, 58, 7),
        datetime.datetime(2017, 8, 28, 4, 58, 15)
    )
    

    Calling str() on your value list is creating a Python representation of a list of values. The Python syntax for value literals is very different to SQL syntax so just slicing off the list repr's wrapping square brackets with [1:-1] doesn't come close to making this SQL:

    • u'...' is not a valid SQL string literal;
    • even if you encoded to UTF-8, so you got the Python 2 byte string literal '...', the syntax differs from SQL in several ways so it would only sometimes work;
    • for example the backslash syntax \\r isn't an escape in SQL string literals;
    • datetime.datetime is a Python datatype, not an SQL one.

    The main mistake is trying to include your values in the query string itself. Even if you manually formatted the values individually you would still run the risk of getting the exact syntax/escaping rules wrong and end up with SQL Injection security vulnerabilities.

    Instead, you should use parameterised queries, passing a list of parameter values in the second argument to execute():

    cols_str = ', '.join(cols)
    params_str = ', '.join(['%s'] * len(vals))
    query = 'INSERT INTO %s (%s) VALUES (%s)' % (table_name, cols_str, params_str)
    cursor.execute(query, vals)
    

    Note: unlike in cols_str, the %s in params_str is a real percent-then-s sequence, not something that gets replaced away. It is passed to execute() as a placeholder for the corresponding parameter value in vals.

    It's pretty confusing that the parameter placeholder looks the same as the string-formatting placeholder, but that's the paramstyle pymysql chooses. Other DBAPI modules are different.

    Also not addressed: we're not escaping the table or column names, which will go wrong if an SQL keyword is used for them, and also cause SQL Injection holes if those names come from untrusted input, though this is much less common than it is for values. Normally to escape schema names you wrap them in double-quotes and double and double-quotes inside, but MySQL may require backticks instead depending on how it is configured.

    All this complexity is why it's a good idea to use an existing data access layer that works this out for you.

    I need a solution to insert my unicode values as is into my MySQL database.

    Apart from the DBAPI layer issues here, you will also need to make sure the tables/columns you have are encoded using a collation that supports all characters. The default-default setting here is to create Latin-1-Swedish tables, which is useless.

    You can include eg CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci in each string column definition you have in CREATE TABLE, or in the initial CREATE DATABASE before creating the tables.