Search code examples
postgresqlset-returning-functions

Difference Between Querying Table Directly and Querying Function that Returns that Same Table


I would like to have a function that returns a TABLE. I know the user can use the function call in selects and joins just like a table. However, will the select/join be able to use the indexes of the source table that the function TABLE returned?

For example: Will "select id from permitted_resources() where id = 1" be the same as "select id from resources where id = 5"? (Assuming that there is an index on resources table id column.)

CREATE OR REPLACE FUNCTION permitted_resources()
  RETURNS TABLE (id   int, name varchar(10)) AS
$func$
BEGIN
   RETURN QUERY
   SELECT r.id, r.name from resources r; 
END
$func$  LANGUAGE plpgsql;

Solution

  • Will "select id from permitted_resources() where id = 1" be the same as "select id from resources where id = 5"?

    No it will not. A PL/pgSQL function is a black box for the optimizer.

    If you want to achieve something like that, use a language sql function:

    CREATE OR REPLACE FUNCTION permitted_resources()
      RETURNS TABLE (id   int, name varchar(10)) AS
    $func$
       SELECT r.id, r.name from resources r; 
    $func$  
    LANGUAGE sql
    stable;
    

    We can test this with the following setup:

    create table test 
    (
      id integer primary key, 
      some_nr integer default random() * 1000 + 1,
      some_date date default current_date,
      some_text text default md5(random()::text)
    );
    
    insert into test (id) 
    select *
    from generate_series(1,1e6);
    

    Now create one PL/pgSQL function:

    create function get_data1()
    returns setof test
    as
    $$
    begin
     return query
       select *
       from test;
    end;   
    $$
    language plpgsql
    stable;
    

    And a SQL function:

    create function get_data2()
    returns setof test
    as
    $$
     select *
     from test;
    $$
    language sql
    stable;
    

    Let's see how the execution plans look:

    explain (analyze)
    select *
    from get_data1() -- this is the PL/pgSQL function
    where id = 1234; 
    

    Yields the following execution plan:

    Function Scan on get_data1  (cost=0.25..4.75 rows=5 width=44) (actual time=261.033..361.218 rows=1 loops=1)
      Filter: (id = 1234)
      Rows Removed by Filter: 999999
    Planning Time: 0.033 ms
    Execution Time: 371.302 ms
    

    Apparently it first retrieves all rows, then discards them again

    However,

    explain (analyze)
    select *
    from get_data2() -- the "SQL" function
    where id = 1234; 
    

    Yields the following execution plan:

    Index Scan using test_pkey on test  (cost=0.42..2.43 rows=1 width=45) (actual time=0.015..0.017 rows=1 loops=1)
      Index Cond: (id = 1234)
    Planning Time: 0.119 ms
    Execution Time: 0.031 ms
    

    The function isn't even mentioned anymore in the plan. Not surprisingly, a plain select yields the same plan:

    explain (analyze)
    select *
    from test
    where id = 1234;
    
    Index Scan using test_pkey on test  (cost=0.42..2.43 rows=1 width=45) (actual time=0.014..0.014 rows=1 loops=1)
      Index Cond: (id = 1234)
    Planning Time: 0.058 ms
    Execution Time: 0.026 ms
    

    I don't know if this holds true for more complex queries, but a simple join between such a function and another table shows the same behaviour.