Search code examples
sqldb2query-performancestored-functionsdb2-luw

Query with function taking longer than same query with subselect


I have a function to get the balance for a customer.

CREATE OR REPLACE FUNCTION default1.get_balance (par_customer_id DECIMAL(31, 0))
  RETURNS DECIMAL(31,15)
  LANGUAGE SQL
  DETERMINISTIC
  NO EXTERNAL ACTION
  READS SQL DATA
BEGIN

  DECLARE var_balance DECIMAL(31,15); 

  SELECT SUM(amount)
  INTO var_balance
  FROM default1.accounting accounting 
  WHERE accounting.customer_id  = par_customer_id
    AND (YEAR(accounting.accounting_date) >= YEAR(SYSDATE)-3 or accounting.accounting_date IS NULL)
    AND paid_date IS NULL
    AND accounting_type_id <> 2
    AND NOT EXISTS (
        SELECT 1 
        FROM default1.accounting_detail detail 
        WHERE accounting.id = detail.accounting_id 
        AND detail.paid_date IS NOT NULL);

  RETURN var_balance;

END

The performance for getting the balance of one customer is good, but using the function in a query to get the balance for multiple customers at once it gets really slow.

SELECT default1.get_balance(customer.id), customer.*
FROM default1.customer customer
WHERE customer.id < 1000

This query takes over 2 minutes to execute.

When I replace the function in the query with a subselect it is much faster.

SELECT 
  (SELECT SUM(amount)
    FROM default1.accounting accounting
    WHERE accounting.customer_id = customer.id
    AND (YEAR(accounting.accounting_date) >= YEAR(SYSDATE)-3 or accounting.accounting_date IS NULL)
    AND paid_date IS NULL
    AND accounting_type_id <> 2
    AND NOT EXISTS (
        SELECT 1 
        FROM default1.accounting_detail detail 
        WHERE accounting.id = detail.accounting_id 
        AND detail.paid_date IS NOT NULL)),
  customer.*
FROM 
    default1.customer customer
WHERE customer.id < 1000

This query takes about 8 seconds.

I did execute the both queries multiple times in different orders without any significant change in the runtime. So I don't think that it is a caching issue.

Why does the query with the function takes about 15 times longer than the query with the subselect?
Is there anything I can change in the function to make it faster?


Solution

  • I'm assuming DB2 for LUW.

    Performance of your function can suffer because it uses a compiled compound statement as its body (BEGIN ... END). Try using an inlined compound statement: BEGIN ATOMIC ... END. Even better, you can probably simply use the RETURN statement only:

    CREATE OR REPLACE FUNCTION default1.get_balance (par_customer_id DECIMAL(31, 0))
      RETURNS DECIMAL(31,15)
      LANGUAGE SQL
      NOT DETERMINISTIC
      NO EXTERNAL ACTION
      READS SQL DATA
    RETURN SELECT SUM(amount)
      INTO var_balance
      FROM default1.accounting accounting 
      WHERE accounting.customer_id  = par_customer_id
        AND (YEAR(accounting.accounting_date) >= YEAR(SYSDATE)-3 or accounting.accounting_date IS NULL)
        AND paid_date IS NULL
        AND accounting_type_id <> 2
        AND NOT EXISTS (
            SELECT 1 
            FROM default1.accounting_detail detail 
            WHERE accounting.id = detail.accounting_id 
            AND detail.paid_date IS NOT NULL);
    

    When using a compiled compound statement, each invocation of the function causes a context switch from the SQL data access engine to the PSM execution engine and back, while the inlined statement becomes a part of the query plan itself.

    Note that you shouldn't declare this function as DETERMINISTIC, because it's not; mis-declaring a non-deterministic function could cause unexpected results.