Search code examples
postgresqlplpgsqlsql-function

PostgreSQL function iterating through a query, how to return a table


I have a table with an id, time, and PostGIS point object column. The table contains multiple points for each ID at different times. I want to find the average distance of one ID's set of points to another ID's set of points, for each point that happens at the same time, and find that average for all other ID's set of points compared to that original ID's set.

So far I have this function:

CREATE TYPE score AS (id int, dist float);
CREATE OR REPLACE FUNCTION avgdist(id1 int) RETURNS TABLE (id int, dist float) LANGUAGE plpgsql AS 
$func$ 
    DECLARE 
    scores score; 
    id2 int; 
    set2 record; 
    begin 
        id2:= 0; 
        IF (id1 = id2 ) THEN 
            id2:= 1; 
        END IF; 
        FOR set2 IN 
        SELECT my_table.id, my_table.time, my_table.geom FROM my_table WHERE my_table.id = id2 loop 
            id2:= id2 + 1; 
            CONTINUE WHEN id1 = id2; 
            EXECUTE 'WITH origin AS (SELECT time, id, geom FROM my_table WHERE id = $1)
                SELECT id, avg(ST_Distance(origin.geom, $2)) 
                FROM origin WHERE origin.time = $3 
                group by origin.id 
                ORDER BY id' 
                    INTO scores 
                    USING id1, set2.geom, set2.time; 
        end loop; 
        RETURN; 
    end 
$func$;

When calling this function with select * from avgdist(2) I do not get any results, same with slect avgdist(2). I do get results when I run the query within the execute on it's own in psql with plugged in values.

I am new to making functions in sql, so i don't really understand how to put the scores in the return table when that table doesn't have a name. And I can't seem to use RETURN QUERY since I need to return the results for all the queries in the loop.

Help on this would be appreciated, or is there a better way to achieve my desired results without looping?


Solution

  • There are more issues:

    • Clause RETURNS TABLE defines OUT variables. You can use these variables.

      CREATE OR REPLACE FUNCTION foo()
      RETURNS TABLE (r1 int, r2 int)
      -- you don't need aux variables for result
      ...
        r1 := 10; r2 := 10;
        RETURN NEXT;
      
    • EXECUTE INTO can store only first row (or value) from of dynamic query result.

    • RETURN stops function evaluation. You should to use RETURN NEXT or RETURN QUERY.

      FOR x, y IN SELECT ..
      LOOP
        -- when only first row of result is interesting
        EXECUTE '..' INTO r1, r2 USING x, y;
        RETURN NEXT;
      END LOOP
      

      or with RETURN QUERY EXECUTE

      FOR x, y IN SELECT ..
      LOOP
        RETURN QUERY EXECUTE '..' USING x, y;
      END LOOP
      
    • I don't understand to your code, but it looks so you don't need to use dynamic SQL - EXECUTE commands. The dynamic SQL is necessary only when you have a variable on place of SQL identifiers. There is not this case. And same case - why you use CTE WITH clause. It is not necessary - and it can has negative impacts (unwanted materialization - fixed by PostgreSQL 12). Use just RETURN QUERY (without EXECUTE) and pass query as query (not as a string).

    The documentation about these commands is good - https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-STATEMENTS-RETURNING