Would the following script be appropriate for populating the date_dim table? I'm using sqlalchemy to populate a temporary table in MySQL with a PANDAS DataFrame. The purpose for the SQL script below is to populate dates_dim with the values of dates_dim_temp if they are not already there. I also intend to create new primary keys for new entries into date_dim but want to check whether I am on the right track with this script before proceeding:
INSERT INTO date_dim (id, created_at, week_day, month_day, month_num, `year`)
SELECT DISTINCT *
FROM date_dim_temp
WHERE NOT EXISTS (SELECT * FROM date_dim_temp ddt JOIN date_dim dd ON dd.created_at = ddt.created_at);
You don't need the join in the subquery of NOT EXISTS
if you only need to check the existence of the column of the column created_at
:
INSERT INTO date_dim (id, created_at, week_day, month_day, month_num, `year`)
SELECT DISTINCT ddt.*
FROM date_dim_temp ddt
WHERE NOT EXISTS (SELECT * FROM date_dim dd WHERE dd.created_at = ddt.created_at);
Also if there is a UNIQUE
constraint for created_at
in date_dim
then you don't need NOT EXISTS
. Use INSERT IGNORE
:
INSERT IGNORE INTO date_dim (id, created_at, week_day, month_day, month_num, `year`)
SELECT DISTINCT ddt.*
FROM date_dim_temp ddt
This will reject also any other rows that would violate any existing constraint.