Search code examples
mysqlsqlvariablesuser-variables

Guarantees when using user variables to number rows


Using user variables to number rows

I often find answers here on SO suggesting the use of user variables to number some thing or other. Perhaps the clearest example would be a query to select every second row from a given result set. (This question and query is similar to this answer, but it was this answer which actually triggered this question here).

SELECT *
FROM (SELECT *, (@row := @row + 1) AS rownum
      FROM (SELECT @row := 0) AS init, tablename
      ORDER BY tablename.ordercol
     ) sub
WHERE rownum % 2 = 1

This approach does seem to usually work.

Reasons to be careful

On the other hand, the MySQ docs state:

As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.

Core question

So my question is not how to achieve such an ordering using current servers, but instead whether the suggested solution using user variables is guaranteed to work under all (reasonable) circumstances and for all future versions of MySQL.

By “guarantees” I mean authoritative sources like the MySQL documentation or some standard MySQL claims conformance with. Lacking such authoritative answers, other sources like often-used tutorials or parts of to the MySQL source code might be quoted instead. By “works” I mean the fact that the assignments will be executed sequentially, once per row of the result, and in the order induced by the ORDER BY line.

Example of a breaking query

To give you an example how easily things fail:

SELECT *
FROM (SELECT *, (@row := @row + 1) AS rownum
      FROM (SELECT @row := 0) AS init, tablename
      HAVING rownum > 0
      ORDER BY tablename.ordercol
     ) sub
WHERE rownum % 2 = 1

will produce an empty result on the MySQL 5.5.27 currently installed on SQL Fiddle. The reason appears to be that the HAVING condition causes the rownum expression to get evaluated twice, so the final result will only have even numbers. I have an idea of what's going on behind the scenes, and I'm not claiming that the query with the HAVING makes much sense. I just want to demonstrate that there is a fine line between code which works and code which looks very similar but breaks.


Solution

  • You misread the statement. It relates to the order of expressions in the SELECT list, when using multiple variables.
    As presented, the ORDER BY on this single-variable statement has a guaranteed order up to the current version of MySQL and nothing in that text suggests it will change.

    But guarantee the future? Who knows.


    Regarding the breaking query, you've again misunderstood how MySQL works. Let's break down your query. Take note of this statement in the manual

    In a SELECT statement, each select expression is evaluated only when sent to the client. This means that in a HAVING, GROUP BY, or ORDER BY clause, referring to a variable that is assigned a value in the select expression list does not work as expected

    The order of processing of queries is roughly

    FROM / JOIN
    WHERE / ON
    GROUP BY / ROLLUP
    HAVING
    UNION
    SELECT
    ORDER BY
    @variable resolution
    

    Your "broken" query attempts to use the variable WITHIN the same level, which is just about as sinful as using a WHERE/HAVING clause against a column alias. That's why you'll never see MySQL variable-based row_numbering solutions using the variable on the same query-level, it is always in a subquery. The outer query can be considered the client of the inner query at which stage the variable/placeholder-expression has been rendered. By your argument, you can just as easily break it using a WHERE clause involving the @row directly (yes it will run!).