With the following table:
CREATE TABLE `Example` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`properties` json DEFAULT NULL,
`hash` binary(20) GENERATED ALWAYS AS (unhex(sha(`properties`))) STORED,
PRIMARY KEY (`id`),
KEY `hash` (`hash`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
the column hash
is derived from the column properties
. In the nomenclature in the docs, {properties} -> {hash}
Also from the docs concerning handling of GROUP BY, comes the following:
SQL:1999 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns: If such a relationship exists between name and custid, the query is legal. This would be the case, for example, were custid a primary key of customers
However, despite this, the following query returns an error (no data is required in the table to reproduce this):
SELECT `properties` from `Example` GROUP BY `hash`;
The error is
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'dispatch.Example.properties' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
The error says the column is NOT functionally dependent. This is probably because the query analyzer doesn't assume that the value returned by the SHA
function is deterministic. Maybe? Is the possibility of a hash collision killing this whole idea?
The hash is considerably less useful if I still have to apply an aggregate function to the properties
column to populate the result. Is there a way I can assert to MySQL that there is functional dependence?
Failing that, what's the best way to say "just give me the property from any matching row" that doesn't involve comparing properties
records (which is the point of the hash)? The best I've come up with is FIRST
in a window function, but that feels janky.
It was the direction of the dependency that I had wrong; with the above example
Select `hash` FROM `Example` GROUP BY `properties`
works correctly, since hash
depends on properties
.
Since in my case hash
exists solely to be more efficient to index and group by, the above is not an option.
Not mentioned in the docs for aggregating functions is the function ANY_VALUE
that unpredictably returns the value from one of the grouped rows. Since I know the two columns are equivalent, I don't care which row it picks for properties
.
The resulting working query is:
SELECT ANY_VALUE(`properties`) from `Example` GROUP BY `hash`;