Search code examples
sqlpostgresqlplpgsqldynamic-sql

Function returning a table for given table and column names


Lets say I have a table named static in two different databases houses and apartments.

static table contains the static information of houses such as house_size, no_rooms, pool, spa.

static table in houses database has columns like this:

pool    spa house_size  sauna   no_rooms
   1    1         25    1       2
   1    0         35    1       3

static table in apartments database has columns like this:

pool    spa house_size  sauna   
   1    1         25    1       
   1    0         35    1       

I want to run below query without raising any error. Currently, I get error as no_rooms column does not exists in apartments.public.static.

select pool, case when spa = 1 then 1 else 0 end as has_spa,
      sauna, house_size, case when no_rooms > 2 then 1 else 0 end as rooms 
from static;

Solution I tried:

WITH static_new AS (SELECT s.*     
FROM (SELECT 0 AS no_rooms) AS dummy 
LEFT JOIN LATERAL
( SELECT
      pool, spa, sauna, house_size, no_rooms
  FROM static
)  AS s on true)
SELECT * FROM static_new;

It works, but this query becomes messy when there are more columns involved.

What I am looking for:

  1. Create a function that takes column names and table name and then does the join which I am doing in above query and returns a table. (Should be generic and work for given column names and table name in parameter and return a table.)

  2. Any other nice and tidy solution?


Solution

  • SQL is a strictly typed language and Postgres functions must declare their return type. Returning a variable number of columns from a function is only possible with workarounds, like polymorphic types. See:

    But we can't work with the row type in your case, as that varies from database to database. The remaining option: return anonymous records and provide a column definition list with every call. I don't usually recommend this, as providing a column definition list with every call can be tedious - and often pointless. But yours might be one of the rare use cases where it makes sense.

    Still, you have to know the data type of possibly missing columns. I'll assume integer for the purpose of this demo. Else you have to pass data types additionally and build the query accordingly.

    CREATE OR REPLACE FUNCTION f_dynamic_select(_tbl regclass
                                              , _cols VARIADIC text[])  -- ①
      RETURNS SETOF record     -- ② anonymous records
      LANGUAGE plpgsql AS
    $func$
    BEGIN
       RETURN QUERY EXECUTE    -- ③ dynamic SQL
       format(
          'SELECT %s FROM %s'  -- ④ safe against SQLi
        , (
          SELECT string_agg(COALESCE(quote_ident(a.attname)
                                  , '0 AS ' || quote_ident(t.col)  -- assuming int!
                                    ), ', ' ORDER  BY t.ord) -- ⑤
          FROM   unnest(_cols) WITH ORDINALITY t(col, ord)   -- ⑤
          LEFT   JOIN pg_attribute a ON a.attrelid = _tbl    -- ⑥
                                    AND a.attnum > 0
                                    AND NOT a.attisdropped 
                                    AND a.attname = t.col
          )
        , _tbl
       );
    END
    $func$;
    

    Call (important!)

    SELECT *
    FROM   f_dynamic_select('static', 'pool', 'spa', 'sauna', 'house_size', 'no_rooms')
    AS     t(pool int, spa int, house_size int, sauna int, no_rooms int);  -- ② column definition list
    

    Your example call, with expressions based on these columns:

    SELECT pool, case when spa = 1 then 1 else 0 end as has_spa  -- ⑦ expressions
         , sauna, house_size
         , case when no_rooms > 2 then 1 else 0 end as rooms 
    FROM   f_dynamic_select('static', 'pool', 'spa', 'sauna', 'house_size', 'no_rooms')
    AS     t(pool int, spa int, house_size int, sauna int, no_rooms int);
    

    db<>fiddle here

    ① The function takes a table name as regclass type. See:

    ... followed by an arbitrary list of column names - in meaningful order. VARIADIC should be convenient for this. See:

    Pass column names as case-sensitive, single-quoted strings. Not (double-quoted) identifiers.

    ② This may be the first time ever I recommend returning anonymous records from a function - after over 1000 answers on the tag. The manual:

    If the function has been defined as returning the record data type, then an alias or the key word AS must be present, followed by a column definition list in the form ( column_name data_type [, ... ]). The column definition list must match the actual number and types of columns returned by the function.

    The manual on dynamic SQL.

    ④ Safe against SQL injection, because the table name is passed as regclass, and the SELECT list is concatenated using quote_ident() carefully. See:

    ⑤ Use WITH ORDINALITY to preserve original order of columns. See:

    LEFT JOIN to the system catalog pg_attribute to identify existing columns. See:

    ⑦ Move expressions building on the passed columns to the outer SELECT.


    Disclaimer: I would only introduce this level of sophistication if I had to. Maybe you can work with simple views in each database after all?