Here is the code of the (My)SQL function (it works outside of the function), but I can't manage to save it as a function for further reuse...
This is an example of the working query:
SELECT
(
SUM(Items_Available * Store_Sales) - (SUM(Items_Available) * SUM(Store_Sales)) / COUNT(*)
) / (
SQRT(
SUM(Items_Available * Items_Available) - (SUM(Items_Available) * SUM(Items_Available)) / COUNT(*)
) * SQRT(
SUM(Store_Sales * Store_Sales) - (SUM(Store_Sales) * SUM(Store_Sales)) / COUNT(*)
)
) as pearson_r
FROM
store_sales
I've extracted the business logic into this UDF:
DELIMITER $$
DROP FUNCTION IF EXISTS PEARSON_R $$
CREATE FUNCTION PEARSON_R(X INT, Y INT) RETURNS FLOAT DETERMINISTIC
BEGIN
RETURN (SUM(X * Y) - (SUM(X) * SUM(Y)) / COUNT(*)) / (SQRT(SUM(X * X) - (SUM(X) * SUM(X)) / COUNT(*)) * SQRT(SUM(Y * Y) - (SUM(Y) * SUM(Y)) / COUNT(*)));
END$$
DELIMITER ;
When I try to execute this code in command line, I get this useless error message:
> SELECT PEARSON_R(Items_Available, Store_Sales) FROM store_sales;
ERROR 1111 (HY000): Invalid use of group function
Do you have any idea?
I tried to simplify a lot the function but once I use a group function, I have this error.
The main problem is that your function refers to aggregate functions : COUNT
, SUM
... These are not valid in the scope of the function, that cannot refer, infer, or even assume anything about the context of the calling query.
In my opinion the function code should not even compile (but it does) ; we can reproduce the issue with just:
select PEARSON_R (1, 2);
Error: ER_INVALID_GROUP_FUNC_USE: Invalid use of group function
If you want to move some of the logic to a function, then it would need to take each and every aggregate value as an argument. The signature would look like:
CREATE FUNCTION PEARSON_R(
SUM_X INT, -- SUM(X)
SUM_Y INT, -- SUM(Y)
SUM_XY INT, -- SUM(X * Y)
SUM_XX INT, -- SUM(X * X)
SUM_YY INT, -- SUM(Y * Y)
CNT INT -- COUNT(*)
) RETURNS FLOAT DETERMINISTIC
Then you would invoke it like so in the query:
SELECT PEARSON_R(
SUM(X),
SUM(Y),
SUM(X * Y),
SUM(X * X),
SUM(Y * Y),
COUNT(*)
) as pearson_r
FROM store_sales;
Note that you could very well use a subquery (or a lateral join) rather than a function:
SELECT ( sum_xy - sum_x * sum_y) / ...
FROM (
SELECT PEARSON_R(
SUM(X) sum_x,
SUM(Y) sum_y,
SUM(X * Y) sum_xy,
SUM(X * X) sum_xx,
SUM(Y * Y) sum_yy,
COUNT(*) cnt
) as pearson_r
FROM store_sales
) t