PostgreSQL Function returning result set from dynamic tables names

In my database, I have the standard app tables and backup tables. Eg. for a table "employee", I have a table called "bak_employee". The bak_employee table is a backup of the employee table. I use it to restore the employee table between tests.

I'd figure I can use these "bak_" tables to see the changes that have occurred during the test like this:

SELECT * FROM employee EXCEPT SELECT * FROM bak_employee

This will show me the inserted and updated records. I'll ignore the deleted records for now.

Now, what I would like to do is go through all the tables in my database to see if there's any changes in any of the tables. I was thinking of doing this as a function so it's easy to call over and over. This is what I have so far:

CREATE OR REPLACE FUNCTION public.show_diff()
  RETURNS SETOF diff_tables AS
$BODY$

DECLARE
  app_tables text;
BEGIN

    FOR app_tables IN
        SELECT table_name  

        FROM   information_schema.tables 

        WHERE  table_catalog = 'myDatabase' 
          AND  table_schema = 'public'
          AND  table_name not like 'bak_%'          -- exclude existing backup tables
    LOOP

        -- somehow loop through tables to see what's changed something like:
        EXECUTE 'SELECT * FROM ' || app_tables || ' EXCEPT SELECT * FROM bak_' || app_tables;

    END LOOP;

    RETURN;
END;
$BODY$
LANGUAGE plpgsql;

But obviously this isn't going to return me any useful information. Any help would be appreciated.

Solution

You cannot return different well-known row types from the same function in the same call. A cheap fix is to cast each row type to text, so we have a common return type:

CREATE OR REPLACE FUNCTION public.show_diff()
  RETURNS SETOF text
  LANGUAGE plpgsql AS   -- text!!
$func$
DECLARE
   app_table text;
BEGIN
   FOR app_table IN
      SELECT table_name  
      FROM   information_schema.tables 
      WHERE  table_catalog = 'myDatabase' 
      AND    table_schema = 'public'
      AND    table_name NOT LIKE 'bak_%'   -- exclude existing backup tables
   LOOP
      RETURN NEXT ' ';
      RETURN NEXT '=== ' || app_table || ' ===';
      RETURN QUERY EXECUTE format(
        'SELECT x::text FROM (TABLE %I EXCEPT ALL TABLE %I) x'
       , app_table, 'bak_' || app_table);
   END LOOP;

   RETURN;
END
$func$;

Call:

SELECT * FROM public.show_diff();

I had the test suggested by @a_horse at first, but after your comment I realized that there is no need for this. EXCEPT considers NULL values to be equal and shows all differences.

While being at it, I improved and simplified your solution some more. Use EXCEPT ALL: cheaper and does not run the risk of folding complete duplicates.

Using EXCEPT clause in PostgreSQL

TABLE is just syntactical sugar. See:

Is there a shortcut for SELECT * FROM?

However, if you have an index on a unique (combination of) column(s), a JOIN like I suggested before should be faster: finding the only possible duplicate via index should be substantially cheaper.

Crucial element is the cast the row type to text (x::text).

You can even make the function work for any table - but never more than one at a time: With a polymorphic parameter type:

Refactor a PL/pgSQL function to return the output of various SELECT queries