Search code examples
mysqlsqlparameterized

SQL parameterization: How does this work behind the scenes?


SQL parameterization is a hot topic nowadays, and for a good reason, but does it really do anything besides escaping decently?

I could imagine a parameterization engine simply making sure the data is decently escaped before inserting it into the query string, but is that really all it does? It would make more sense to do something differently in the connection, e.g. like this:

> Sent data. Formatting: length + space + payload
< Received data
-----
> 69 SELECT * FROM `users` WHERE `username` LIKE ? AND `creation_date` > ?
< Ok. Send parameter 1.
> 4 joe%
< Ok. Send parameter 2.
> 1 0
< Ok. Query result: [...]

This way would simply eliminate the issue of SQL injections, so you wouldn't have to avoid them through escaping. The only other way I can think of how parameterization might work, is by escaping the parameters:

// $params would usually be an argument, not in the code like this
$params = ['joe%', 0];

// Escape the values
foreach ($params as $key=>$value)
    $params[$key] = mysql_real_escape_string($value);

// Foreach questionmark in the $query_string (another argument of the function),
// replace it with the escaped value.
$n = 0;
while ($pos = strpos($query_string, "?") !== false && $n < count($params)) {
    // If it's numeric, don't use quotes around it.
    $param = is_numeric($params[$n]) ? $params[$n] : "'" . $params[$n] . "'";
    // Update the query string with the replaced question mark
    $query_string = substr($query_string, 0, $pos) //or $pos-1? It's pseudocode...
                  . $param
                  . substr($query_string, $pos + 1);
    $n++;

If the latter is the case, I'm not going to switch my sites to parameterization just yet. It has no advantage that I can see, it's just another strong vs weak variable typing discussion. Strong typing may catch more errors in compiletime, but it doesn't really make anything possible that would be hard to do otherwise - same with this parameterization. (Please correct me if I'm wrong!)


Update:

  • I knew this would depend on the SQL server (and also on the client, but I assume the client uses the best possible techniques), but mostly I had MySQL in mind. Answers concerning other databases are (and were) also welcome though.
  • As far as I understand the answers, parameterization does indeed do more than simply escaping the data. It is really sent to the server in a parameterized way, so with variables separated and not as a single query string.
  • This also enables the server to store and reuse the query with different parameters, which provides better performance.

Did I get everything? One thing I'm still curious about is whether MySQL has these features, and if query reusage is automatically done (or if not, how this can be done).

Also, please comment when anyone reads this update. I'm not sure if it bumps the question or something...

Thanks!


Solution

  • I'm sure that the way that your command and parameters are handled will vary depending on the particular database engine and client library.

    However, speaking from experience with SQL Server, I can tell you that parameters are preserved when sending commands using ADO.NET. They are not folded into the statement. For example, if you use SQL Profiler, you'll see a remote procedure call like:

    exec sp_executesql N'INSERT INTO Test (Col1) VALUES (@p0)',N'@p0 nvarchar(4000)',@p0=N'p1'
    

    Keep in mind that there are other benefits to parameterization besides preventing SQL injection. For example, the query engine has a better chance of reusing query plans for parameterized queries because the statement is always the same (just the parameter values change).

    In response to update: Query parameterization is so common I would expect MySQL (and really any database engine) to handle it similarly.

    Based on the MySQL protocol documentation, it looks like prepared statements are handled using COM_PREPARE and COM_EXECUTE packets, which do support separate parameters in binary format. It's not clear if all parameterized statements will be prepared, but it does look like unprepared statements are handled by COM_QUERY which has no mention of parameter support.

    When in doubt: test. If you really want to know what's sent over the wire, use a network protocol analyzer like Wireshark and look at the packets.

    Regardless of how it's handled internally and any optimizations it may or may not currently provide for a given engine, there's very little (nothing?) to gain from not using parameters.