Search code examples
python-3.xpandaspandas-datareader

Why pandas.read_sql returns an empty DataFrame?


I'm trying to retrieve the data from database and save into pandas.DataFrame. Here is my Python script,

conn = pyodbc.connect(sql_server)
query = '''SELECT a1, a2, a3
FROM '''  + dbschema + '''.SomeResults
WHERE FactorName = \' ''' + FactorName + ''' \' AND parametername = 'Param1' ORDER BY Factor1 '''
df = pd.read_sql(query, conn)
print(df)

However, it returns,

Empty DataFrame
Columns: [a1, a2, a3]
Index: []

I'm pretty sure it's not SQL problem, as I can retrieve the data from database using conn.cursor().


Solution

  • the reason is the way of generating that SQL:

    In [307]: dbschema = 'db'
    
    In [308]: FactorName = 'Factor1'
    
    In [309]: query = '''SELECT a1, a2, a3
         ...: FROM '''  + dbschema + '''.SomeResults
         ...: WHERE FactorName = \' ''' + FactorName + ''' \' AND parametername = 'Param1' ORDER BY Factor1 '''
    
    In [310]: print(query)
    SELECT a1, a2, a3
    FROM db.SomeResults
    WHERE FactorName = ' Factor1 ' AND parametername = 'Param1' ORDER BY Factor1
    
    # NOTE: spaces      ^       ^
    

    You should not generate SQL this way, as it might be dangerous (read about SQL injections).

    This would be a proper way:

    query = """
    SELECT a1, a2, a3
    FROM {}.SomeResults
    WHERE FactorName = ? AND parametername = 'Param1'
    ORDER BY Factor1
    """
    
    df = pd.read_sql(query.format(dbschema), conn, params=(FactorName,))
    

    NOTE: only literals can be parameterized. I.e. we can NOT parameterized schema names, table names, column, names, etc.

    Here is a funny example of a SQL injection:

    enter image description here