Search code examples
postgresqlindexingprimary-keyauto-increment

Serial column takes up disproportional amount of space in PostgreSQL


I would like to create an auto-incrementing id column that is not a primary key in a PostgreSQL table. The table is currently just over 200M rows and contains 14 columns.

SELECT pg_size_pretty(pg_total_relation_size('mytable'));

The above query reveals that mytable takes up 57 GB on disk. I currently have 30 GB free space remaining on disk after checking with df -h (on Ubuntu 20.04)

What I don't understand is why, after trying to create a SERIAL column, I completely run out of disk space - the query ends up never finishing. I run the following command:

ALTER TABLE mytable ADD COLUMN id SERIAL;

and then see how gradually, my disk space runs out until there is nothing left and the query fails. I am no database expert but it does not make sense. Why would a simple serialized column take up more than half of the space of the table itself, especially when it is not a primary key and therefore has no index? Is there a known workaround to creating such an auto-incrementing id column?


Solution

  • As a proof of concept:

    create table id_test(pk_fld integer primary key generated always as identity);
    --FYI, in Postgres 14+ the overriding system value won't be needed.
    --That is a hack around a bug in 13-
    insert into id_test overriding system value values (default), (default);
    select * from id_test;
     pk_fld 
    --------
          1
          2
    alter table id_test add column id_fld integer ;
    update id_test set id_fld = 0;
    alter table id_test alter COLUMN id_fld set not null;
    alter table id_test alter COLUMN id_fld add generated always as identity;
    update id_test set id_fld = default;
    select * from id_test;
    pk_fld | id_fld 
    --------+--------
          1 |      1
          2 |      2
    

    Basically this breaks the process down into steps. Obviously this is just a toy table and not representative of your setup. I would try it on test table that is a subset of you actual table to see what happens to disk space consumption. It would not hurt to use VACUUM after the updates to return rows to the database.