Search code examples
pythonsql-serverpandaspyodbc

How to create a parameterized query for pandas with a dynamic table name that avoids SQL injection?


I am using SQL Server, pandas, and pyodbc.

I am trying to make a function that can take a column name, table name, another column name (for a GUID-type column), and a GUID value and make an SQL call like SELECT columnName1 FROM tableName WHERE columnName2 = guidValue.

This function was working just fine when I wasn't worrying about SQL injection and I was just manually sticking square brackets where needed:

def get_sql_entity_set_data(entity_set_name, filter_expression=None, query_params='*'):
    connString = get_connection_string()
    conn = pyodbc.connect(connString)
    query = "SELECT " + query_params + " FROM [" + entity_set_name + "]"

    if filter_expression:
        query += " WHERE " + filter_expression

    logger.debug(f'Attempting the following SQL command: {query}.')
    retValue = pd.read_sql(query, conn)

    returnCount = len(retValue.index)

    if returnCount == 0:
        logger.warning('Fetch failed: no data retrieved.')
    else:
        logger.info(f'{returnCount} row(s) retrieved.')

    return retValue

(When I was doing it that way, I would pass the full WHERE clause as a parameter - e.g., "[Name] = 'Bob Smith'".)

But now I am trying to protect against SQL injection and having a devil of a time. Nothing I try works. I have learned that you can't parameterize table names. I also learned that you can't pass the asterisk '*' as a parameter - e.g., SELECT ? FROM..., so my function got a lot uglier with if logic to handle whether the SELECT column name was specified in the parameters or it was just defaulted to '*'.

I was under the impression that I would be able to work around the can't-parameterize-table-name issue by making a prepared statement using EXEC, so I have tried various prepared statements build up from this basic idea (just trying to come up with a simplified example off the top of my head)...

DECLARE @table nvarchar(128);
DECLARE @query nvarchar(max);
SET @table = ?;
SET @query = 'SELECT * FROM @table';
EXEC @query;

...but that didn't work.

At this point, I have given up on that approach and just put the table name right in the connection string as before. The only thing I'm doing is using another function to wrap it in square brackets. This is the state of my code right now, and it's still not working (no error, it just returns 0 results, even though I have confirmed on SSMS that the parameters I'm feeding it should get 1 result):

def get_data_with_guid_filter(entity_set_name, filter_col, filter_val):
    connString = get_connection_string()
    conn = pyodbc.connect(connString)
    logger.debug(f'Attempting to fetch all data from {entity_set_name} where {filter_col} = {filter_val}.')
    entity_set_name = paramaterize(entity_set_name, "sysid")
    filter_col = paramaterize(filter_col, "sysid")
    filter_val = paramaterize(filter_val, "guid")
    params_tuple = (filter_col,) + (filter_val,)
    query = f"SELECT * FROM {entity_set_name} WHERE ? = ?"

    retValue = pd.read_sql(query, conn, params=params_tuple)

    returnCount = len(retValue.index)

    if returnCount == 0:
        logger.warning('Fetch failed: no data retrieved.')
    else:
        logger.info(f'{returnCount} row(s) retrieved.')

    return retValue

def paramaterize(param, param_type):
    if param_type == "guid":
        return f"(SELECT CONVERT(uniqueidentifier, '{param}'))"
    elif param_type == "string":
        return f"'{param}'"
    elif param_type == "sysid":
        return f"[{param}]"
    else:
        return param

Let's say the parameters are 'mytable', 'ID', and '123456' [not literally - just pretend that's a real GUID]. I can go into SSMS right now, enter SELECT * FROM [mytable] WHERE [ID] = (SELECT CONVERT(uniqueidentifier, '123456')), and get 1 result. But in Python, I get zero results and no error.

What's extra weird is I still get 0 results and no error even if I do either of these:

  • change filter_val = paramaterize(filter_val, "guid") to filter_val = paramaterize(filter_val, "string") (this is weird to me, because when I was trying the EXEC approach, I got an error that I understood to mean I could not filter a GUID column by an nvarchar value in a prepared statement!)
  • remove the outer brackets from the value that gets returned by filter_val = paramaterize(filter_val, "guid"), even though SSMS seems to require those brackets

Solution

  • The comments and a bit more digging helped guide me to the solution I was looking for. This allows me to have a function that dynamically makes an SQL call given table name, column name, and GUID parameters while also (AFAIK) providing security against SQL injection:

    def get_sql_entity_set_data(entity_set_name, filter_col=None, filter_val=None):
        connString = get_connection_string()
        conn = pyodbc.connect(connString)
        entity_set_name = paramaterize(entity_set_name)
    
        if filter_col is not None and filter_val is not None:
            filter_col = paramaterize(filter_col)
    
            query = f"""
                SET NOCOUNT ON;
                DECLARE @table nvarchar(128) = {entity_set_name};
                DECLARE @whereColumn nvarchar(128) = {filter_col};
                DECLARE @query nvarchar(max) = 'SELECT * FROM ' + QUOTENAME(@table) + ' WHERE ' + QUOTENAME(@whereColumn) + ' = ' + ?;
                EXEC sp_executesql @query
            """
    
            logger.debug(f'Attempting the following SQL command: {query}.')
            retValue = pd.read_sql(query, conn, params=(paramaterize(filter_val),))
        else:
            query = f"""
                SET NOCOUNT ON;
                DECLARE @table nvarchar(128) = {entity_set_name};
                DECLARE @query nvarchar(max) = 'SELECT * FROM ' + QUOTENAME(@table);
                EXEC sp_executesql @query
            """
    
            logger.debug(f'Attempting the following SQL command: {query}.')
            retValue = pd.read_sql(query, conn)
    
        returnCount = len(retValue.index)
    
        if returnCount == 0:
            logger.warning('Fetch failed: no data retrieved.')
        else:
            logger.info(f'{returnCount} row(s) retrieved.')
    
        return retValue
    
    def paramaterize(param):
        return f"'{param}'"
    

    Note: I learned that I couldn't use a ? for a column name either (just like I couldn't use one for a table name).