Search code examples
sqlfunctionpostgresqlset-returning-functionslateral

Create table from multiple calls to a function returning record


I have a function that does some basic statistics on a range of data based on a start and stop timestamp:

CREATE OR REPLACE FUNCTION cal(TIMESTAMP, TIMESTAMP, OUT Date_Time timestamp with time zone, OUT avg numeric, OUT stddev numeric, OUT Rstedv_per numeric) 
AS $$
SELECT
    max(datetime) as Date_Time,
    avg(SO2) AS Mean,
    stddev_samp(so2) as STD_DEV,
    stddev_samp(so2)/avg(SO2)*100 as Rstedv_Per
FROM Table43
WHERE datetime > $1 AND datetime < $2;
$$
 LANGUAGE SQL;

This works great with a simple single select such as this:

select * FROM
 cal('2014-08-02 05:29:00', '2014-08-02 05:32:00')

But now I am having trouble creating another function, or even a select statement that can combine multiple calls of the 'cal' function. For example I want to return a table that contains three time periods. So the return would be 4 columns by 3 rows:

'2014-08-02 05:29:00', '2014-08-02 05:32:00'
'2014-08-02 05:35:00', '2014-08-02 05:39:00'
'2014-08-02 05:45:00', '2014-08-02 05:49:00'

Solution

  • Use a VALUES expression to provide multiple rows of input dates. Then ...

    For all versions of Postgres

    SELECT cal(a, b)
    FROM  (
       VALUES 
          ('2014-08-02 05:29'::timestamp, '2014-08-02 05:32'::timestamp)
        , ('2014-08-02 05:35', '2014-08-02 05:39')
        , ('2014-08-02 05:45', '2014-08-02 05:39')
       ) v(a, b);
    

    You can replace the VALUES expression with an actual table.

    This returns whole rows as a single column (instead of individual columns). You could decompose in place with (cal(a, b)).*. While this works, it is inefficient. Due to a weakness in the Postgres parser, this would result in multiple evaluation of the function. Detailed explanation:

    Instead, use a subquery for better performance:

    SELECT (rec).*
    FROM  (
        SELECT cal(a, b)
        FROM  (
           VALUES 
              ('2014-08-02 05:29'::timestamp, '2014-08-02 05:32'::timestamp)
            , ('2014-08-02 05:35', '2014-08-02 05:39')
            , ('2014-08-02 05:45', '2014-08-02 05:39')
           ) v(a, b)
       ) sub;
    

    SQL Fiddle (for pg 8.3 to demonstrate it works in old versions).

    Since set-returning functions in the SELECT list are frowned upon by some, and are non-standard SQL.

    Postgres 9.3+

    That's the main reason why Postgres 9.3 introduced (SQL-standard compliant) LATERAL JOIN:

    SELECT f.*
    FROM  (
       VALUES 
          ('2014-08-02 05:29'::timestamp, '2014-08-02 05:32'::timestamp)
        , ('2014-08-02 05:35', '2014-08-02 05:39')
        , ('2014-08-02 05:45', '2014-08-02 05:39')
       ) v(a,b)
      , cal(v.a, v.b) f;
    

    The LATERAL JOIN is implicit here, since the second item in the FROM list references the previous table explicitly. Details:

    SQL Fiddle for current Postgres 9.3.