Search code examples
databasepostgresqlsize

PostgreSQL column size vs table size


I am trying to get only one table size, and then size of each row. While I was using

SELECT pg_table_size(*mytable*) as DATA;

I get 393216 bytes.

Using

SELECT (SUM(pg_column_size(t) - 24)) FROM *mytable* AS t;

(as written here How to get each row size of a particular table in postgresql..?)

I get 560669 bytes.

560669 vs 393216 bytes - which one is real?


Solution

  • From https://www.postgresql.org/docs/14/functions-admin.html

    pg_table_size - Computes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map).

    So pg_table_size gives you the amount of disk postgres is using for the table and for some metadata that postgres keeps about the table (Visibility Map and Free Space Map). Deleting a row will not decrease this number (unless you do a VACUUM FULL), so we wouldn't expect the disk used by a table to match the sum of the data in each visible row. Instead, the disk used by a table would be larger.

    pg_column_size - Shows the number of bytes used to store any individual data value. If applied directly to a table column value, this reflects any compression that was done.

    So this returns the size of each row on disk (including the row-header information stored on disk).

    I'm not sure whether you'd consider the row header information 'real', but it does take up space on your harddrive, so whether this is correct or not depends on your use case.

    Using an example table from a database I have:

    SELECT pg_table_size('users')
    -- 3751936 <-- the size of my 'users' table on disk, including table meta-data
    SELECT (SUM(pg_column_size(t.*))) FROM users AS t;
    -- 3483028 <-- the total size on disk of the visible rows, including row "header" metadata.
    SELECT (SUM(pg_column_size(t.*)-24)) FROM users AS t;
    -- 3069412 <-- the size of the data in visible rows, excluding row "header" meta-data
    

    We'd expect each of these queries to return different numbers, and each is useful for a different purpose.

    As for the specific numbers that you've posted (with the pg_column_size being larger than the pg_table_size) I can't explain.