Search code examples
postgresqlamazon-web-servicesamazon-rdspgadmin

AWS database single column adds extremely much data


I'm retrieving data from an AWS database using PgAdmin. This works well. The problem is that I have one column that I set to True after I retrieve the corresponding row, where originally it is set to Null. Doing so adds an enormous amount of data to my database.

I have checked that this is not due to other processes: it only happens when my program is running. I am certain no rows are being added, I have checked the number of rows before and after and they're the same.

Furthermore, it only does this when changing specific tables, when I update other tables in the same database with the same process, the database size stays the same. It also does not always increase the database size, only once every couple changes does the total size increase.

How can changing a single boolean from Null to True add 0.1 MB to my database?

I'm using the following commands to check my database makeup:

To get table sizes

SELECT
    relname as Table,
    pg_total_relation_size(relid) As Size,
    pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as External Size
FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;

To get number of rows:

SELECT schemaname,relname,n_live_tup 
  FROM pg_stat_user_tables 
  ORDER BY n_live_tup DESC;

To get database size:

SELECT pg_database_size('mydatabasename')

Solution

  • If you have not changed that then your fillfactor is at 100% on the table since that is the default.

    This means that every change in your table will mark the changed row as obsolete and will recreate the updated row. The issue could be even worse if you have indices on your table since those should be updated on every row change too. As you could imagine this hurts the UPDATE performance too.

    So technically if you would read the whole table and update even the smallest column after reading the rows then it would double the table size when your fillfactor is 100.

    What you can do is to ALTER your table lower the fillfactor on it, then VACUUM it:

    ALTER TABLE your_table SET (fillfactor = 90);
    VACUUM FULL your_table;
    

    Of course with this step your table will be about 10% bigger but Postgres will spare some space for your updates and it won't change its size with your process.

    The reason why autovacuum helps is because it cleans the obsoleted rows periodically and therefore it will keep your table at the same size. But it puts a lot of pressure on your database. If you happen to know that you'll do operations like you described in the opening question then I would recommend tuning the fillfactor for your needs.