Search code examples
sqlpostgresqldynamic-sqlinformation-schema

PostgreSQL dynamic query: find most recent timestamp of several unrelated tables


I have a number of tables, and many of them have a timestamp column. I can get a list of every table with a timestamp column:

SELECT table_name 
FROM information_schema.columns 
WHERE table_schema='my_schema' AND column_name='timestamp';

--------------
 table_name               
--------------
 apples
 bananas
 sharks
 lemons

I can get the hightest timestamp in the sharks table as follows

SELECT MAX(timestamp) FROM sharks;

-------------------------
           max           
-------------------------
 2021-11-24 00:00:00.000

I would like to get a table like

 table_name  |       last_updated               
-------------+-------------------------
 apples      | 2021-11-23 00:02:00.000
 bananas     | 2019-10-16 00:04:00.000
 sharks      | 2021-11-24 00:00:00.000
 lemons      | 1970-01-03 10:00:00.000

I'm suspecting this requires dynamic SQL, so I'm trying something like

SELECT (
    EXECUTE 'SELECT MAX(timestamp) FROM my_schema.' || table_name
) FROM (
    SELECT table_name 
    FROM information_schema.columns 
    WHERE table_schema='my_schema' AND column_name='timestamp'
);

But it seems like EXECUTE doesn't work in subqueries.

Performance is not particularly a concern, just producing the desired results.


Solution

  • Dynamic Queries cannot function outside of PL/pgSQL blocks, so you need to wrap your code in one.

    I set up test tables similar to yours, with only the "timestamp" column shared between them:

    drop table if exists public.sharks_70099803 cascade; 
    create table public.sharks_70099803 
    as  select  1::int integer_column, 
                now()::timestamp as "timestamp";
    drop table if exists public.bananas_70099803 cascade; 
    create table public.bananas_70099803
    as select   'some text'::text text_column, 
                now()::timestamp as "timestamp";
    

    Wrap a dynamic query in PL/pgSQL function. Inside I build a query to pull the max(timestamp) from each table with the column, then aggregate those into one query with a union all in between, that I later execute.

    CREATE OR REPLACE FUNCTION public.test_70099803()
    RETURNS SETOF RECORD
    LANGUAGE 'plpgsql'
    AS $BODY$
    BEGIN 
        return query 
        execute (
            select string_agg(select_per_table,' union all ') 
            from (  
                select 'select '''||
                        table_name||
                        ''' as table_name, max(timestamp) from public.'||
                        table_name "select_per_table" 
                from information_schema.columns 
                where table_schema='public' 
                and column_name='timestamp'
            ) a
        );
    END 
    $BODY$;
    
    select * from public.test_70099803() as t(table_name text, max_timestamp timestamp);
    --    table_name    |       max_timestamp
    --------------------+----------------------------
    -- sharks_70099803  | 2021-11-24 17:12:03.24951
    -- bananas_70099803 | 2021-11-24 17:12:03.253614
    --(2 rows)
    

    You can parametrise your function to be applicable to more groups of tables, or to have a predefined output table structure that'll let you just select * from test_70099803();