I have some SQL where i need to group some rows in a query, and use a random value for each row after grouping, so that the random value will be evenly distributed for each row after grouping, but i am not sure how mySql is handling this
see this very simplified example:
CREATE TABLE IF NOT EXISTS soldier (
unit VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
personal_number INT
)
select unit, count(name), rand()
from soldier
group by unit
so i tried to enter 3 soldiers from one unit and one soldier from another, and the random column seems to be distributed rather similarly in the results set. but i suspect that what might be happening is that maybe the value for the grouped row is taken always from the smallest value soldier row, or from largest, and then the distribution would be skewed. if OTOH it is taken randomly from any of the grouped soldiers, or averaged, or being calculated after grouping, i am good(as far as i can tell). does anyone know how rand() is calculated in such cases?
I forgot about the basics. Your query is evaluated in this order:
FROM, GROUP BY, COUNT, SELECT
In your example, MySQL creates two groups before select:
The RAND()
function will be called once for each row after grouping. Your query, in its current form, is correct.
Note than RAND()
can safely be used inside SELECT
because it is a function, not a column reference. And just to be sure, I replaced it with a UDF that records how many times it was called; it was called twice for your example.