Search code examples
postgresqloptimizationset-returning-functions

PostgreSQL Set-Returning Function Call Optimization


I have the following problem with PostgreSQL 9.3.

There is a view encapsulating a non-trivial query to some resources (e.g., documents). Let's illustrate it as simple as

CREATE VIEW vw_resources AS
  SELECT * FROM documents; -- there are several joined tables in fact...

The client application uses the view usually with some WHERE conditions on several fields, and might also use paging of the results, so OFFSET and LIMIT may also be applied.

Now, on top of the actual resource list computed by vw_resources, I only want to display resources which the current user is allowed for. There is quite a complex set of rules regarding privileges (they depend on several attributes of the resources in question, explicit ACLs, implicit rules based on user roles or relations to other users...) so I wanted to encapsulate all of them in a single function. To prevent repetitive costly queries for each resource, the function takes a list of resource IDs, evaluates the privileges for all of them at once, and returns the set of the requested resource IDs together with the according privileges (read/write is distinguished). It looks roughly like this:

CREATE FUNCTION list_privileges(resource_ids BIGINT[])
  RETURNS TABLE (resource_id BIGINT, privilege TEXT)
AS $function$
BEGIN
  -- the function lists privileges for a user that would get passed in an argument - omitting that for simplicity
  RAISE NOTICE 'list_privileges called'; -- for diagnostic purposes
  -- for illustration, let's simply grant write privileges for any odd resource:
  RETURN QUERY SELECT id, (CASE WHEN id % 2 = 1 THEN 'write' ELSE 'none' END)
               FROM unnest(resource_ids) id;
END;
$function$ LANGUAGE plpgsql STABLE;

The question is how to integrate such a function in the vw_resources view for it to give only resources the user is privileged for (i.e., has 'read' or 'write' privilege).

A trivial solution would use a CTE:

CREATE VIEW vw_resources AS
  WITH base_data AS (
    SELECT * FROM documents
  )
  SELECT base_data.*, priv.privilege
  FROM base_data
       JOIN list_privileges((SELECT array_agg(resource_id) FROM base_data)) AS priv USING (resource_id)
  WHERE privilege IN ('read', 'write');

The problem is that the view itself gives too much rows - some WHERE conditions and OFFSET/LIMIT clauses are only applied to the view itself, like SELECT * FROM vw_resources WHERE id IN (1,2,3) LIMIT 10 (any complex filtering might be requested by the client application). And since PostgreSQL is unable to push the conditions down the CTE, the list_privileges(BIGINT[]) function ends up with evaluating privileges for all resources in the database, which effectively kills the performance.

So I attempted to use a window function which would collect resource IDs from the whole result set and join the list_privileges(BIGINT[]) function in an outer query, like illustrated below, but the list_privileges(BIGINT[]) function ends up being called repetitively for each row (as testified by 'list_privileges called' notices), which kinda ruins the previous effort:

CREATE VIEW vw_resources AS
  SELECT d.*, priv.privilege
  FROM (
    SELECT *, array_agg(resource_id) OVER () AS collected
    FROM documents
  ) AS d
  JOIN list_privileges(d.collected) AS priv USING (resource_id)
  WHERE privilege IN ('read', 'write');

I would resort to forcing clients to give two separate queries, the first taking the vw_resources without privileges applied, the second calling the list_privileges(BIGINT[]) function passing it the list of resource IDs fetched by the first query, and filtering the disallowed resources on the client side. It is quite clumsy for the client, though, and obtaining e.g. the first 20 allowed resources would be practically impossible as limiting the first query simply does not get it - if some resources are filtered out due to privileges then we simply don't have 20 rows in the overall result...

Any help welcome!

P.S. For the sake of completeness, I append a sample documents table:

CREATE TABLE documents (resource_id BIGINT, content TEXT);
INSERT INTO documents VALUES (1,'a'),(2,'b'),(3,'c');

Solution

  • If you must use plpgsql then create the function taking no arguments

    create function list_privileges()
      returns table (resource_id bigint, privilege text)
    as $function$
    begin
      raise notice 'list_privileges called'; -- for diagnostic purposes
      return query select 1, case when 1 % 2 = 1 then 'write' else 'none' end
      ;
    end;
    $function$ language plpgsql stable;
    

    And join it to the other complex query to form the vw_resources view

    create view vw_resources as
    select *
    from
        documents d
        inner join
        list_privileges() using(resource_id)
    

    The filter conditions will be added at query time

    select *
    from vw_resources
    where
        id in (1,2,3)
        and
        privilege in ('read', 'write')
    

    Let the planner do its optimization magic and check the explain output before any "premature optimization".

    This is just a conjecture: The function might make it harder or impossible for the planner to optimize.

    If plpgsql is not really necessary, and that is very frequent, I would just create a view in instead of the function

    create view vw_list_privileges as
    select
        1 as resource_id,
        case when 1 % 2 = 1 then 'write' else 'none' end as privilege
    

    And join it the same way to the complex query

    create view vw_resources as
    select *
    from
        documents d
        inner join
        vw_list_privileges using(resource_id)