Search code examples
sqlpostgresqldatabase-administrationpostgresql-performancebulkupdate

Column Copy and Update vs. Column Create and Insert


I have a table with 32 Million rows and 31 columns in PostgreSQL 9.2.10. I am altering the table by adding columns with updated values.

For example, if the initial table is:

id     initial_color
--     -------------
1      blue
2      red
3      yellow

I am modifying the table so that the result is:

id     initial_color     modified_color
--     -------------     --------------
1      blue              blue_green
2      red               red_orange
3      yellow            yellow_brown

I have code that will read the initial_color column and update the value.

Given that my table has 32 million rows and that I have to apply this procedure on five of the 31 columns, what is the most efficient way to do this? My present choices are:

  1. Copy the column and update the rows in the new column
  2. Create an empty column and insert new values

I could do either option with one column at a time or with all five at once. The columns types are either character varying or character.


Solution

  • The columns types are either character varying or character.

    Don't use character, that's a misunderstanding. varchar is ok, but I would suggest just text for arbitrary character data.

    Given that my table has 32 million rows and that I have to apply this procedure on five of the 31 columns, what is the most efficient way to do this?

    If you don't have objects (views, foreign keys, functions) depending on the existing table, the most efficient way is create a new table. Something like this ( details depend on the details of your installation):

    BEGIN;
    LOCK TABLE tbl_org IN SHARE MODE;  -- to prevent concurrent writes
    
    CREATE TABLE tbl_new (LIKE tbl_org INCLUDING STORAGE INCLUDING COMMENTS);
    
    ALTER tbl_new ADD COLUMN modified_color text
                , ADD COLUMN modified_something text;
                -- , etc
    INSERT INTO tbl_new (<all columns in order here>)
    SELECT <all columns in order here>
        ,  myfunction(initial_color) AS modified_color  -- etc
    FROM   tbl_org;
    -- ORDER  BY tbl_id;  -- optionally order rows while being at it.
    
    -- Add constraints and indexes like in the original table here
    
    DROP tbl_org;
    ALTER tbl_new RENAME TO tbl_org;
    COMMIT;
    

    If you have depending objects, you need to do more.

    Either was, be sure to add all five at once. If you update each in a separate query you write another row version each time due to the MVCC model of Postgres.

    Related cases with more details, links and explanation:

    While creating a new table you might also order columns in an optimized fashion: