Search code examples
databasepostgresqlplpgsql

How to list tables of specific database and sort them by size with PostgreSQL?


I have tried to select a list of databases and select all tables from each database, ordering the first ten tables by their size - including their indexes- through PL/pgSQL, but I have had a problem to join the query for the database with the query for the tables and their sizes included in the specific database.

I have started like this:

    DO $$
DECLARE
  database_name pg_database.datname%TYPE;
  total_table_size pg_tables.tablename%TYPE;
    rec record;
BEGIN
    FOR rec IN SELECT datname
  FROM pg_database
LOOP
  database_name := rec.datname;
 
  raise notice 'Database name: %', database_name;
 
  SELECT tablename,
  pg_total_relation_size(table_name) AS total_table_size
  FROM pg_tables
  INTO table_name, total_table_size
  ORDER BY pg_total_relation_size(relid) DESC
  LIMIT 10;
 END LOOP;
END;
$$;

I do not know how to specify that I want the first ten table names and their sizes ordered from the largest to the smallest of the present selected database. Could anyone help me with it, please?

I thought about joining the queries somehow, but I have not found a column that I could use to make the join.

I have searched similar problems in the community, but I did not find something so specific.

Thanks in advance.


Solution

    1. You're already limiting the result to just top 10 rows through the addition of LIMIT 10. You're just duplicating the same 10 rows as many times as there are databases in your catalog, but still those are results for the current database you're connected to.

    2. You're probably familiar with all Database Object Size Functions but while it's good to order by the raw byte size, run it through pg_size_pretty() in your select section to make it readable.

    3. Don't forget about schemas/namespaces. You can have multiple of those in a single database, and a table of the same name in each one. In pg_tables it's under schemaname, in information_schema.tables it's under table_schema. The recommendation is

      Since the information schema is SQL-standard whereas the views described here are PostgreSQL-specific, it's usually better to use the information schema if it provides all the information you need.

    4. While you can list all databases, roles and tablespaces available on the cluster in system catalogs, there's no cluster-wide catalog for namespaces and tables. There's no native/built-in/standard way in PostgreSQL to switch databases from inside the session, nor is there a way to reference objects in other databases, be it on the same cluster or elsewhere. To be able to do these things from within your session, you need postgres_fdw or dblink both of which actually start and additional client on your behalf.

      Off-db, you can simply instruct your client to open and close connections wherever and however you want (the idea with \c or \connect -reuse-previous=on in psql) as long as you're fine shoveling the data yourself and having to maintain a separate utility.

    5. There are pre-built external tools you can use to monitor PostgreSQL.

    6. CREATE VIEW v_pg_total_relation_sizes on each of your databases (here are examples you can just prepend with create view ... as). To make it present by default in databases created in the future, create one in template1 database as well. Make them visible through postgres_fdw, then create a view that's a union of the local one and all the linked ones. Depending on how large your databases are, if this gets slow you might want to consider making it a materialized view to cache it.

    Here's a demo for point 6 above: (db<>fiddle)

    create table local_table as
    select generate_series(1,2e5,1), (gen_random_uuid())::text;
    
    create view v_pg_total_relation_sizes as 
    SELECT nspname || '.' || relname AS "relation",
        pg_size_pretty(pg_total_relation_size(C.oid)) AS "readable_size",
        pg_total_relation_size(C.oid) as size
    FROM pg_class C
    LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
    WHERE nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
    ORDER BY pg_total_relation_size(C.oid) DESC;
    

    Connect to your template1 database (it's always created during initial PostgreSQL setup since it's meant to be just that: a template for all new databases) and create similar objects:

    create schema neighbour_schema;
    
    create table neighbour_schema.neighbour_table as
    select generate_series(1,3e5,1), (gen_random_uuid())::text;
    
    create view neighbour_schema.v_pg_total_relation_sizes as 
    SELECT nspname || '.' || relname AS "relation",
        pg_size_pretty(pg_total_relation_size(C.oid)) AS "readable_size",
        pg_total_relation_size(C.oid) as size
      FROM pg_class C
      LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
      WHERE nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
      ORDER BY pg_total_relation_size(C.oid) DESC;
    
    create database neighbour_db;
    

    Now that they are in template1, all new databases created with create database will have it from the start.

    When setting up postgres_fdw (and/or dblink), note that by default it'll use local Unix domain socket if you don't supply host and user information. For different users and hosts make sure you're also keeping pg_hba.conf up to date on all instances you connect/link to.

    On your main db:

    create extension postgres_fdw;
    
    CREATE SERVER foreign_server
            FOREIGN DATA WRAPPER postgres_fdw
            OPTIONS (dbname 'neighbour_db');
    CREATE USER MAPPING FOR current_user
            SERVER foreign_server
            OPTIONS (user 'postgres');
    CREATE FOREIGN TABLE fdw_v_pg_total_relation_sizes (
            relation text,
            readable_size text,
            size bigint )
      SERVER foreign_server
      OPTIONS (schema_name 'neighbour_schema', 
               table_name 'v_pg_total_relation_sizes');
    

    Set up the materialized view and UNION ALL the local and linked views:

    create materialized view f_v_cluster_pg_total_relation_sizes as 
    select current_database() as dbname,* from v_pg_total_relation_sizes
    union all
    select 'neighbour_db',* from fdw_v_pg_total_relation_sizes
    order by size desc;
    
    refresh materialized view f_v_cluster_pg_total_relation_sizes;
    
    select * from f_v_cluster_pg_total_relation_sizes;
    
    dbname relation readable_size size
    neighbour_db neighbour_schema.neighbour_table 22 MB 23076864
    postgres public.local_table 15 MB 15736832
    postgres public.f_v_cluster_pg_total_relation_sizes 16 kB 16384
    postgres public.pg_temp_16464 8192 bytes 8192
    postgres public.dblink_pkey_results 0 bytes 0
    postgres public.fdw_v_pg_total_relation_sizes 0 bytes 0
    postgres public.v_pg_total_relation_sizes 0 bytes 0
    neighbour_db neighbour_schema.v_pg_total_relation_sizes 0 bytes 0

    Note that this view needs to be redefined whenever new databases are added, which can be automated through dynamic SQL in a PL/pgSQL routine, although not fully since neither a regular trigger on pg_databse can be created, nor an event trigger on ddl_statement_start/_end can fire on a create database.