Search code examples
mysqlsqlfingerprintpercona

Percona query fingerprinting - why does order matter in select columns?


Suppose I have two queries:

SELECT name, description FROM items WHERE id = 5;
SELECT description, name FROM items WHERE id = 5;

When I use pt-fingerprint from the Percona Toolkit on these queries, they give a different fingerprint:

select name, description from items where id = ?;
select description, name from items where id = ?;

Since they are essentially the same query, shouldn't they give the same fingerprint? Am I missing something?


Solution

  • Yes, you're right, the order of columns in the select-list of a query makes no difference to performance.

    But treating those two queries as the same fingerprint could obscure the source of the query.

    Suppose you have those two queries, each one is in a different part of your application. You might like to know which one is responsible for 40% of your query load and which is responsible for only 2% of your query load.

    It would also be much more complex to produce the fingerprint of a query if it had to detect the commutativity of columns as you describe. That would also apply to boolean terms in the WHERE clause, and to some extent order of joined tables in the FROM clause, and order of unioned queries in a UNION, too.

    The code for fingerprint() is only about 100 lines of code to do pattern-matching implemented with regular expressions. Doing what you describe would require a full-blown SQL parser. See the code here: https://github.com/percona/percona-toolkit/blob/3.0/lib/QueryRewriter.pm