Search code examples
sqlsnowflake-cloud-data-platformrdbms

How to optimally add columns to a table of large size?


I have a table with millions of rows of data. Now for some new incoming data, I have to store some more properties. Altering the table at this stage to create a new column would consume a lot of computing power because the table already has millions of rows of data. What is the most optimal way to store this new data if such new properties are added from time-to-time?

Edit: After a suggestion from a user, I wanted to add the platform that I am using: SnowFlake Cloud Data Platform.


Solution

  • Don't worry too much about the computing power needed to add columns - it's mostly a logical problem, not physical.

    I did this test with an 8.4M rows table:

    
    create table clone1
    clone my_big_table;
    
    alter table clone1
    add column justtest NUMBER(38,0);
    
    -- adding a column was a quick operation, just metadata probably (210ms)
    
    create table lone2
    clone my_big_table;
    
    alter table clone2
    add column justtest NUMBER(38,0) default 7;
    
    -- adding a column with a default value was a quick operation too, just metadata probably (256ms)
    
    select justtest
    from clone2
    limit 10;
    
    -- correct data returned
    
    create table clone3
    clone my_big_table;
    
    alter table clone3
    add column justtest NUMBER(38,0) default 7;
    
    -- again, adding a column with a default value was quick
    
    update clone3
    set justtest=1;
    
    -- this took a longer time - changing an existing value for a new one (1min 18s)
    
    

    Adding a column to a table shouldn't be a problem - just test the operation with a table clone before.