Search code examples
greenplumhawq

How to get total row count and max(timestamp) column in select list for all table in particular schema


We have Postgres based Read Only database. in that,we have 52 tables under one schema.

We are trying to output row count and max(timestamp) column for all tables under one schema.

Environment is :

PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 1.3.0.2 build 14421) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled

We tried on :

SELECT 
  nspname AS schemaname,relname,reltuples,max(time)
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE 
  nspname NOT IN ('pg_catalog', 'information_schema') AND
  relkind='r' 
ORDER BY reltuples DESC;

In this query, we got the row count column but still max(timestamp) for all table is not achieved.

Any help on it would be much appreciated ?


Solution

  • What you are accessing with this query is database statistics, which is not 100% accurate and might be missing or outdated depending on your statistics collection processes.

    To get the row count for a list of tables, you have to scan each of these tables. However you can use pg_relation_size() to get an idea of the table size in bytes, and this function does not require you scanning the table.

    If your table list is static, you can get away with a query like this:

    select 'table1', count(*), max(time) from table1
    union all
    select 'table2', count(*), max(time) from table2
    union all
    ...
    select 'table52', count(*), max(time) from table52;
    

    This solution is not flexible as if table list has changed, you need to rewrite your query.

    Second option is to generate this query and manually execute it:

    select string_agg(query, ' union all ') as query
        from (
            select 'select ''' || n.nspname || '.' || c.relname || ''', count(*), max(time) from ' || n.nspname || '.' || c.relname as query
                from pg_namespace as n, pg_class as c
                where n.oid = c.relnamespace
                    and n.nspname = 'my_schema'
            ) as q;
    

    This is more flexible, however the second query should be executed manually.

    And finally your last option - writing a function for doing so:

    create or replace function table_sizes (schemaname varchar) returns setof record as $BODY$
    declare
        r record;
        t varchar;
    begin
        for t in execute $$
            select n.nspname || '.' || c.relname
                from pg_namespace as n, pg_class as c
                where n.oid = c.relnamespace
                    and c.relkind = 'r'
                    and n.nspname = '$$ || schemaname || $$'$$
        loop
            execute 'select ''' || t || '''::varchar, count(*), max(time) from ' || t
                into r;
            return next r;
        end loop;
        return;
    end;
    $BODY$ language plpgsql volatile;
    
    select * from table_sizes('public') t(tablename varchar, rowcount bigint, maxtime time);