Search code examples
mysqlpython-3.xsqlalchemypymysql

How to execute multiple SQL commands at once in pd.read_sql_query?


Let me create an use case to discuss on.

CREATE  DATABASE sample;
USE sample;
CREATE TABLE quote (
  `id` int(2) unsigned NOT NULL AUTO_INCREMENT,
  `code` text ,
  `date` date DEFAULT NULL,
  `close` double DEFAULT NULL,
  PRIMARY KEY (`id`)
) ;

INSERT INTO quote (`code`, `date`, `close`)
VALUES ('epm', '20200824', 2.64); 
INSERT INTO quote (`code`, `date`, `close`)
VALUES ('dss', '20200824', 6.4); 

It is simple to execute only one sql command with sqlalchemy.

import pandas as pd 
from sqlalchemy import create_engine
user = 'root'
mysql_pass = 'your mysql passwd'
mysql_ip = '127.0.0.1'
engine = create_engine("mysql+pymysql://{}:{}@{}:3306".format(user,mysql_pass,mysql_ip))
cmd_one_line_sql = 'select * from sample.quote;'
df = pd.read_sql_query(cmd_one_line_sql,con = engine)
df 
   id code        date  close
0   1  epm  2020-08-24   2.64
1   2  dss  2020-08-24   6.40

I get the desired result,now the cmd contains multiple sql commands,for simplicity,it contains only two lines

cmd_multi_lines_sql = 'use sample;select * from quote;'

The cmd_multi_lines_sql just split cmd_one_line_sql as two.
I rewrite the code snippet according to manual:
execute many sql commands with sqlalchemy

import pandas as pd
from sqlalchemy import create_engine
user = 'root'
mysql_pass = 'your mysql passwd'
mysql_ip = '127.0.0.1'
engine = create_engine("mysql+pymysql://{}:{}@{}:3306".format(user,mysql_pass,mysql_ip))
connection = engine.raw_connection()
cmd_multi_lines_sql = 'use sample;select * from quote;'
try:
    cursor = connection.cursor()
    cursor.execute(cmd_multi_lines_sql)
    results_one = cursor.fetchall()
finally:
    connection.close()

Get the below error info:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 170, in execute
    result = self._query(query)
  File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 328, in _query
    conn.query(q)
  File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 732, in _read_query_result
    result.read()
  File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 684, in _read_packet
    packet.check_error()
  File "/usr/local/lib/python3.5/dist-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/local/lib/python3.5/dist-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'select * from quote' at line 1")

Another try:

vim  /tmp/test.sql
use sample;
select * from quote;

#write the commands in `/tmp/test.sql`
f = open('/tmp/test.sql','r')
cmd = f.read() 
df = pd.read_sql_query(cmd, con = engine)

It output the same error info.How to fix it?


Solution

  • The issues you face are:

    1. You need to pass the MULTI_STATEMENTS flag to PyMySQL, and
    2. read_sql_query assumes that the first result set contains the data for the DataFrame, and that may not be true for an anonymous code block.

    You can create your own PyMySQL connection and retrieve the data like this:

    import pandas as pd
    import pymysql
    from pymysql.constants import CLIENT
    
    conn_info = {
        "host": "localhost",
        "port": 3307,
        "user": "root",
        "password": "toot",
        "database": "mydb",
        "client_flag": CLIENT.MULTI_STATEMENTS,
    }
    
    cnxn = pymysql.connect(**conn_info)
    crsr = cnxn.cursor()
    
    sql = """\
    CREATE TEMPORARY TABLE tmp (id int primary key, txt varchar(20)) 
        ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
    INSERT INTO tmp (id, txt) VALUES (1, 'foo'), (2, 'ΟΠΑ!');
    SELECT id, txt FROM tmp;
    """
    crsr.execute(sql)
    
    num_tries = 5
    result = None
    for i in range(num_tries):
        result = crsr.fetchall()
        if result:
            break
        crsr.nextset()
    
    if not result:
        print(f"(no result found after {num_tries} attempts)")
    else:
        df = pd.DataFrame(result, columns=[x[0] for x in crsr.description])
        print(df)
        """console output:
           id   txt
        0   1   foo
        1   2  ΟΠΑ!
        """
    

    (Edit) Additional notes:

    Note 1: As mentioned in another answer, you can use the connect_args argument to SQLAlchemy's create_engine method to pass the MULTI_STATEMENTS flag. If you need a SQLAlchemy Engine object for other things (e.g., for to_sql) then that might be preferable to creating your own PyMySQL connection directly.

    Note 2: num_tries can be arbitrarily large; it is simply a way of avoiding an endless loop. If we need to skip the first n empty result sets then we need to call nextset that many times regardless, and once we've found the non-empty result set we break out of the loop.