Search code examples
mysqldatabasedata-warehousestar-schema

Is this an appropriate way of populating dimension table with unique values?


Would the following script be appropriate for populating the date_dim table? I'm using sqlalchemy to populate a temporary table in MySQL with a PANDAS DataFrame. The purpose for the SQL script below is to populate dates_dim with the values of dates_dim_temp if they are not already there. I also intend to create new primary keys for new entries into date_dim but want to check whether I am on the right track with this script before proceeding:

INSERT INTO date_dim  (id, created_at, week_day, month_day, month_num, `year`) 
SELECT DISTINCT *
FROM date_dim_temp 
WHERE NOT EXISTS (SELECT * FROM date_dim_temp ddt JOIN date_dim dd ON  dd.created_at = ddt.created_at);

Solution

  • You don't need the join in the subquery of NOT EXISTS if you only need to check the existence of the column of the column created_at:

    INSERT INTO date_dim (id, created_at, week_day, month_day, month_num, `year`) 
    SELECT DISTINCT ddt.*
    FROM date_dim_temp ddt
    WHERE NOT EXISTS (SELECT * FROM date_dim dd WHERE dd.created_at = ddt.created_at);
    

    Also if there is a UNIQUE constraint for created_at in date_dim then you don't need NOT EXISTS. Use INSERT IGNORE:

    INSERT IGNORE INTO date_dim (id, created_at, week_day, month_day, month_num, `year`) 
    SELECT DISTINCT ddt.*
    FROM date_dim_temp ddt
    

    This will reject also any other rows that would violate any existing constraint.