I am trying to insert 2x 700 000 records into an InnoDB table and it is in my opinion rather slow.
I have tried several things and I am not sure what is the best way to achieve the most efficient way of inserting.
Create table sql:
DROP TABLE IF EXISTS `booking_daily_analysis`;
CREATE TABLE IF NOT EXISTS `booking_daily_analysis` (
`id` INT NOT NULL AUTO_INCREMENT,
`booking_id` INT NULL,
`action_id` INT NOT NULL,
`creative_id` INT NULL,
`position_id` INT NULL,
`profile_id` INT NULL,
`start` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`end` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`hits` INT NOT NULL DEFAULT 0,
`uniqueHits` INT NOT NULL DEFAULT 0 COMMENT 'contacts van vroeger',
PRIMARY KEY (`id`,`action_id`)
#INDEX `booking_id_idx` (`booking_id` ASC),
#FOREIGN KEY (`booking_id`) REFERENCES `booking` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
#INDEX `creative_id_idx` (`creative_id` ASC),
#FOREIGN KEY (`creative_id`) REFERENCES `creative` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
#INDEX `position_id_idx` (`position_id` ASC),
#FOREIGN KEY (`position_id`) REFERENCES `position` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
#INDEX `action_id_idx` (`action_id` ASC),
#FOREIGN KEY (`action_id`) REFERENCES `action` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,
#INDEX `profile_id_idx` (`profile_id` ASC),
#FOREIGN KEY (`profile_id`) REFERENCES `profile` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARACTER SET=utf8;
As you can see a lot of indexes and foreign keys (innoDb needs an index for each foreign key) but indexes slow the inserting way down, so I am trying to add them after my insert through an alter statement:
START TRANSACTION;
alter table `booking_daily_analysis` add index `booking_id_idx` (`booking_id` ASC), add constraint `fk_booking_id` foreign key (`booking_id`) REFERENCES `booking` (`id`) on delete set null on update cascade;
alter table `booking_daily_analysis` add index `creative_id_idx` (`creative_id` ASC), add constraint `fk_creative_id` foreign key (`creative_id`) references `creative` (`id`) on delete set null on update cascade;
alter table `booking_daily_analysis` add index `position_id_idx` (`position_id` ASC), add constraint `fk_position_id` foreign key (`position_id`) references `position` (`id`) on delete set null on update cascade;
alter table `booking_daily_analysis` add index `action_id_idx` (`action_id` ASC), add constraint `fk_action_id` foreign key (`action_id`) references `action` (`id`) on delete set null on update cascade;
alter table `booking_daily_analysis` add index `profile_id_idx` (`profile_id` ASC), add constraint `fk_profile_id` foreign key (`profile_id`) references `profile` (`id`) on delete set null on update cascade;
COMMIT;
Not sure if transaction is necessary.
At the top of my script I have specified these options:
SET foreign_key_checks=0;
SET unique_checks=0;
The bottom:
SET unique_checks = 1;
SET foreign_key_checks = 1;
And the 2x 700 000 insert statement (only 2 lines for this)
START TRANSACTION;
insert into nrc.booking_daily_analysis (id, action_id, start, end, hits, uniqueHits, position_id, booking_id, creative_id, profile_id)
select id, 1, start, end, impressions, contacts, position_id, booking_id, creative_id, new_profile_id from adhese_nrc.temp_ad_slot_ids;
COMMIT;
START TRANSACTION;
-- Insert clicks for click action (click action is 2)
insert into nrc.booking_daily_analysis (id, action_id, start, end, hits, uniqueHits, position_id, booking_id, creative_id, profile_id)
select id, 2, start, end, clicks, 0, position_id, booking_id, creative_id, new_profile_id from adhese_nrc.temp_ad_slot_ids;
COMMIT;
As you can see, the only difference in the insert is the action ID (1 -> 2).
So I was wondering, is this the way to go or am I missing something here?
Lastest output from MySQL workbench:
14:32:13 START TRANSACTION 0 row(s) affected 0.000 sec
14:32:13 FIRST INSERT 717718 row(s) affected Records: 717718 @ 11.263 sec
14:32:24 COMMIT 0 row(s) affected 0.020 sec
14:32:24 START TRANSACTION 0 row(s) affected 0.000 sec
14:32:24 SECOND INSERT 717718 row(s) affected Records: 717718 @ 21.268 sec
14:32:46 COMMIT 0 row(s) affected 0.011 sec
14:32:46 START TRANSACTION 0 row(s) affected 0.000 sec
14:32:46 add index `booking_id_idx` 1435436 row(s) affected Records: 1435436 @ 39.393 sec
14:33:25 add index `creative_id_idx 1435436 row(s) affected Records: 1435436 @ 68.801 sec
14:34:34 add index `position_id_idx` 1435436 row(s) affected Records: 1435436 Duplicates: 0 Warnings: 0 @ 142.877 sec
14:36:57 add index `action_id_idx` 1435436 row(s) affected Records: 1435436 Duplicates: 0 Warnings: 0 @ 162.160 sec
14:40:00 add index `profile_id_idx` 1435436 row(s) affected Records: 1435436 Duplicates: 0 Warnings: 0 @ 763.309 sec
This manual page also suggests changing the innodb_autoinc_lock_mode
.
If you do not need the feature, disable binary logging.
Increasing the size of some InnoDB buffers could help too (the innodb_buffer_pool_size
in particular).
I believe using a transaction is not desirable in this case. If a small number of consecutive changes need to be applied within the same transaction, these changes may be optimized by being consolidated in one single write. In your case, I believe you will only load your redo log uselessly.
This leads me to another suggestion: try inserting a smaller number of rows at a time, like this:
INSERT INTO destination
SELECT * FROM source LIMIT 0, 10000;
INSERT INTO destination
SELECT * FROM source LIMIT 10000, 10000; -- and so on
Finally, if you have a lot of memory available, you may want to manually load the whole data into a temporary memory table, then insert into your destination from this memory table (possibly in small batches):
CREATE TEMPORARY TABLE destination_tmp LIKE source;
ALTER destination_tmp ENGIN=MEMORY;
INSERT INTO destination_tmp SELECT * FROM source;
INSERT INTO destination SELECT * FROM destination_tmp;
Make sure to have a large enough value for max_heap_table_size
.