Search code examples
mysqlsqlaggregate-functionssql-function

I can't find the issue with this SQL funtion


Here is the code of the (My)SQL function (it works outside of the function), but I can't manage to save it as a function for further reuse...

This is an example of the working query:

SELECT
  (
    SUM(Items_Available * Store_Sales) - (SUM(Items_Available) * SUM(Store_Sales)) / COUNT(*)
  ) / (
    SQRT(
      SUM(Items_Available * Items_Available) - (SUM(Items_Available) * SUM(Items_Available)) / COUNT(*)
    ) * SQRT(
      SUM(Store_Sales * Store_Sales) - (SUM(Store_Sales) * SUM(Store_Sales)) / COUNT(*)
    )
  ) as pearson_r
FROM
  store_sales

I've extracted the business logic into this UDF:

DELIMITER $$
DROP FUNCTION IF EXISTS PEARSON_R $$
CREATE FUNCTION PEARSON_R(X INT, Y INT) RETURNS FLOAT DETERMINISTIC
BEGIN
  
  RETURN (SUM(X * Y) - (SUM(X) * SUM(Y)) / COUNT(*)) / (SQRT(SUM(X * X) - (SUM(X) * SUM(X)) / COUNT(*)) * SQRT(SUM(Y * Y) - (SUM(Y) * SUM(Y)) / COUNT(*)));
END$$
DELIMITER ;

When I try to execute this code in command line, I get this useless error message:

> SELECT PEARSON_R(Items_Available, Store_Sales) FROM store_sales;

ERROR 1111 (HY000): Invalid use of group function

Do you have any idea?

I tried to simplify a lot the function but once I use a group function, I have this error.


Solution

  • The main problem is that your function refers to aggregate functions : COUNT, SUM... These are not valid in the scope of the function, that cannot refer, infer, or even assume anything about the context of the calling query.

    In my opinion the function code should not even compile (but it does) ; we can reproduce the issue with just:

    select PEARSON_R (1, 2);
    Error: ER_INVALID_GROUP_FUNC_USE: Invalid use of group function
    

    If you want to move some of the logic to a function, then it would need to take each and every aggregate value as an argument. The signature would look like:

    CREATE FUNCTION PEARSON_R(
        SUM_X  INT,     -- SUM(X)
        SUM_Y  INT,     -- SUM(Y)
        SUM_XY INT,     -- SUM(X * Y)
        SUM_XX INT,     -- SUM(X * X)
        SUM_YY INT,     -- SUM(Y * Y)
        CNT    INT      -- COUNT(*)
    ) RETURNS FLOAT DETERMINISTIC    
    

    Then you would invoke it like so in the query:

    SELECT PEARSON_R(
        SUM(X),
        SUM(Y),
        SUM(X * Y),
        SUM(X * X),
        SUM(Y * Y),
        COUNT(*)
    ) as pearson_r 
    FROM store_sales;
    

    Note that you could very well use a subquery (or a lateral join) rather than a function:

    SELECT ( sum_xy - sum_x * sum_y) / ...
    FROM (
        SELECT PEARSON_R(
            SUM(X) sum_x,
            SUM(Y) sum_y,
            SUM(X * Y) sum_xy,
            SUM(X * X) sum_xx,
            SUM(Y * Y) sum_yy,
            COUNT(*) cnt
        ) as pearson_r 
        FROM store_sales
    ) t