I'm fairly new to queries which involve variable declaration in MySQL. I have seen various styles and I'm not fully clear of what these actually do. I've questions about what these actually do.
1)
set @row:=0;
SELECT name, @row:=@row + 1 AS rownum
FROM animal
2)
SELECT name, @row:=@row + 1 AS rownum
FROM (SELECT @row:= 0) c, animal
Both returns the same:
name rownum
|| cat || 1 ||
|| cat || 2 ||
|| dog || 3 ||
|| dog || 4 ||
|| dog || 5 ||
|| ant || 6 ||
What are the differences in the above two queries and which of the two to adopt as to their scope, efficiency, coding habit, use-cases?
3) Now if I do this:
set @row:=0;
SELECT name, @row:=@row + 1 AS rownum
FROM (SELECT @row:= 123) c, animal
I get
name rownum
|| cat || 124 ||
|| cat || 125 ||
|| dog || 126 ||
|| dog || 127 ||
|| dog || 128 ||
|| ant || 129 ||
So doesn't that mean that the inner variable initialization is overriding the outer initialization and leaving the latter redundant hence (and hence its always a better practice to initialize in a SELECT
?
4) If I merely do:
SELECT name, @row:=@row + 1 AS rownum
FROM animal
I get
name rownum
|| cat || NULL ||
|| cat || NULL ||
|| dog || NULL ||
|| dog || NULL ||
|| dog || NULL ||
|| ant || NULL ||
I can understand that since row
isn't initialized. But if I run any of the other queries (may be variable row
is getting initialized?) I see that row
variable is incremented every time I run the above query. That is it gives me the result on first run:
name rownum
|| cat || 1 ||
|| cat || 2 ||
|| dog || 3 ||
|| dog || 4 ||
|| dog || 5 ||
|| ant || 6 ||
and then when re-run it yields in
name rownum
|| cat || 7 ||
|| cat || 8 ||
|| dog || 9 ||
|| dog || 10 ||
|| dog || 11 ||
|| ant || 12 ||
So is row
being stored somewhere? And what is its scope and lifespan?
5) If I have query like this:
SELECT (CASE WHEN @name <> name THEN @row:=1 ELSE @row:=@row + 1 END) AS rownum,
@name:=name AS name
FROM animal
This always yields the right result:
rownum name
|| 1 || cat ||
|| 2 || cat ||
|| 1 || dog ||
|| 2 || dog ||
|| 3 || dog ||
|| 1 || ant ||
So doesn't that mean its not always necessary to initialize variable at the top or in a SELECT
depending on the query?
Make sure to read the manual section on user variables.
What are the differences in the above two queries and which of the two to adopt as to their scope, efficiency, coding habit, use-cases?
Query 1) uses multiple statements. It can therefore rely on the order in which these statements are executed, ensuring that the variable is set before it gets incremented.
Query 2) on the other hand does the initialization in a nested subquery. This turns the whole thing into a single query. You don't risk forgetting the initialization. But the code relies more heavily on the internal workings of the mysql server, particularly the fact that it will execute the subquery before it starts computing results for the outer query.
So doesn't that mean that the inner variable initialization is overriding the outer initialization and leaving the latter redundant hence (and hence its always a better practice to initialize in a
SELECT
?
This is not about inner and outer, but about sequential order: the subquery is executed after the SET
, so it will simply overwrite the old value.
So is row being stored somewhere? And what is its scope and lifespan?
User variables are local to the server connection. So any other process will be unaffected by the setting. Even the same process might maintain multiple connections, with independent settings of user variables. Once a connection is closed, all variable settings are lost.
So doesn't that mean its not always necessary to initialize variable at the top or in a SELECT depending on the query?
Quoting from the manual:
If you refer to a variable that has not been initialized, it has a value of
NULL
and a type of string.
So you can use a variable before it is initialized, but you have to be careful that you can actually deal with the resulting NULL
value in a reasonable way. Note however that your query 5) suffers from another problem explicitely stated in the manual:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server. In
SELECT @a, @a:=@a+1, ...
, you might think that MySQL will evaluate@a
first and then do an assignment second. However, changing the statement (for example, by adding aGROUP BY
,HAVING
, orORDER BY
clause) may cause MySQL to select an execution plan with a different order of evaluation.
So in your case, the @name:=name
part could get executed before the @name <> name
check, causing all your rownum
values to be the same. So even if it does work for now, there are no guarantees that it will work in the future.
Note that I've been very sceptic about using user variables in this fashion. I've already quoted the above warning from the manual in comments to several answers. I've also asked questions like the one about Guarantees when using user variables to number rows. Other users are more pragmatic, and therefore more willing to use code that appears to work without express guarantees that things will continue to work as intended.