Search code examples
pythonmysqlsql-injectionpymysqlstring-operations

Why is PyMySQL not vulnerable to SQL injection attacks?


I am new to PyMySQL and just tried to execute a query:

c.execute('''INSERT INTO mysql_test1 (
                                    data,
                                    duration,
                                    audio,
                                    comments
                                ) VALUES (
                                    ?,
                                    ?,
                                    ?,
                                    ?
                                );
                            ''', [
                                    comments_var,
                                    duration_var,
                                    audio_var,
                                    comments_var    
                                ]
                            );

However, it threw the following error:

TypeError: not all arguments converted during string formatting

I noticed that something must be wrong with my variables and read up on how to properly deal with them in PyMySQL, expecting methods for parameter substitution, but to my surprise I could not find anything. Instead, every thread I found used string operations (e.g. here, here, here and here (with a comment claiming that string operations would be standard with PyMySQL).

This is interesting to me because I have previously only dealt with SQLite where the DBAPI documentation explicitly warns to use string operations with variables:

SQL operations usually need to use values from Python variables. However, beware of using Python’s string operations to assemble queries, as they are vulnerable to SQL injection attacks.

The documentation exemplifies this with the following code snippet:

Never do this -- insecure!
symbol = 'RHAT'
cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Instead, use the DB-API’s parameter substitution.

When reading the PyMySQL docs, I could not find any mention of such dangers. It only confirmed my previous findings:

If args is a list or tuple, %s can be used as a placeholder in the query. If args is a dict, %(name)s can be used as a placeholder in the query.

Why is using string operations in sqlite3 considered vulnerable to SQL injection attacks and at the same time not questioned in pymysql?


Solution

  • It's a pity that the designers of pymysql chose to use %s as the parameter placeholder. It confuses many developers because that's the same as the %s used in string-formatting functions. But it's not doing the same thing in pymysql.

    It's not just doing a simple string substitution. Pymysql will apply escaping to the values before interpolating them into the SQL query. This prevents special characters from changing the syntax of the SQL query.

    In fact, you can get into trouble with pymysql too. The following is unsafe:

    cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
    

    Because it interpolates the variable symbol into the string before passing it as an argument to execute(). The only argument is then a finished SQL string with the variable(s) formatted into it.

    Whereas this is safe:

    cur.execute("SELECT * FROM stocks WHERE symbol = %s", (symbol,))
    

    Because it passes the list consisting of the symbol variable as a second argument. The code in the execute() function applies escaping to each element in the list, and interpolates the resulting value into the SQL query string. Note the %s is not delimited by single-quotes. The code of execute() takes care of that.