Search code examples
sqlpostgresqlfunctionplpgsql

PostgreSQL Function returning result set from dynamic tables names


In my database, I have the standard app tables and backup tables. Eg. for a table "employee", I have a table called "bak_employee". The bak_employee table is a backup of the employee table. I use it to restore the employee table between tests.

I'd figure I can use these "bak_" tables to see the changes that have occurred during the test like this:

SELECT * FROM employee EXCEPT SELECT * FROM bak_employee

This will show me the inserted and updated records. I'll ignore the deleted records for now.

Now, what I would like to do is go through all the tables in my database to see if there's any changes in any of the tables. I was thinking of doing this as a function so it's easy to call over and over. This is what I have so far:

CREATE OR REPLACE FUNCTION public.show_diff()
  RETURNS SETOF diff_tables AS
$BODY$

DECLARE
  app_tables text;
BEGIN

    FOR app_tables IN
        SELECT table_name  

        FROM   information_schema.tables 

        WHERE  table_catalog = 'myDatabase' 
          AND  table_schema = 'public'
          AND  table_name not like 'bak_%'          -- exclude existing backup tables
    LOOP

        -- somehow loop through tables to see what's changed something like:
        EXECUTE 'SELECT * FROM ' || app_tables || ' EXCEPT SELECT * FROM bak_' || app_tables;

    END LOOP;

    RETURN;
END;
$BODY$
LANGUAGE plpgsql;

But obviously this isn't going to return me any useful information. Any help would be appreciated.


Solution

  • You cannot return different well-known row types from the same function in the same call. A cheap fix is to cast each row type to text, so we have a common return type:

    CREATE OR REPLACE FUNCTION public.show_diff()
      RETURNS SETOF text
      LANGUAGE plpgsql AS   -- text!!
    $func$
    DECLARE
       app_table text;
    BEGIN
       FOR app_table IN
          SELECT table_name  
          FROM   information_schema.tables 
          WHERE  table_catalog = 'myDatabase' 
          AND    table_schema = 'public'
          AND    table_name NOT LIKE 'bak_%'   -- exclude existing backup tables
       LOOP
          RETURN NEXT ' ';
          RETURN NEXT '=== ' || app_table || ' ===';
          RETURN QUERY EXECUTE format(
            'SELECT x::text FROM (TABLE %I EXCEPT ALL TABLE %I) x'
           , app_table, 'bak_' || app_table);
       END LOOP;
    
       RETURN;
    END
    $func$;
    

    Call:

    SELECT * FROM public.show_diff();
    

    I had the test suggested by @a_horse at first, but after your comment I realized that there is no need for this. EXCEPT considers NULL values to be equal and shows all differences.

    While being at it, I improved and simplified your solution some more. Use EXCEPT ALL: cheaper and does not run the risk of folding complete duplicates.

    TABLE is just syntactical sugar. See:

    However, if you have an index on a unique (combination of) column(s), a JOIN like I suggested before should be faster: finding the only possible duplicate via index should be substantially cheaper.

    Crucial element is the cast the row type to text (x::text).

    You can even make the function work for any table - but never more than one at a time: With a polymorphic parameter type: