Search code examples
postgresqldbeaverpostgresql-14

PostgreSQL - Find row data size in GB, for each five-year period


I'm trying to find size of data in the table year wise.

I have a large table around 150 columns and billions of records, and also having date column in it based on year I want to show the size of data in the GB.

I found we can get table size using SELECT pg_size_pretty( pg_total_relation_size('tablename') );

but didn't understand how to get actual data size based on condition.

Expected Output:

year   |   size(GB)
--------------------
2010     10
2015     40
2020     80

Solution

  • More or less as per the comment:
    demo at db<>fiddle

    select date_part('year', created_at)-date_part('year', created_at)::int % 5
           as year
          ,pg_size_pretty(sum(pg_column_size(test)))
    from test
    group by 1
    order by 1;
    
    year pg_size_pretty
    1985 100 kB
    1990 5396 kB
    1995 8598 kB
    2000 12 MB
    2005 16 MB
    2010 18 MB
    2015 21 MB
    2020 24 MB

    The size includes the date-type created_at column - I'm feeding entire rows of test into pg_column_size() without unpacking the fields, same way you can select x from x - that's the table name used as a single selected thing. Group by date_trunc() or date_part()/extract(), subtracting modulo 5 (rounding down to 0 or 5 in units) and sum() that.

    To only measure the payload and not the dates, you will need to sum pg_column_size() of each column with a bunch of additions + (there's no variadic sum() function out of the box). Not date_bin() after all, because it's limited to less than a month per bucket:

    The stride interval must be greater than zero and cannot contain units of month or larger.

    The demo shows an example of how to dynamically generate the uglier query that excludes dates from the sum of sizes, based on information_schema.columns.

    The result won't include how much space the values take up in the indexes, only how much they occupy in the table and toast, post compression:

    If applied directly to a table column value, this reflects any compression that was done.

    If the table is partitioned by the date column, you can collect pg_total_relation_size() for all partitions, union that and aggregate over it, which would also tell you how much space their indexes take.