We want to map the entries of the calibration_data to the calibration data by following query. But the duration of this query is quite too long in my opinion (>24h).
Is there any optimization possible? We added for testing more Indexes as needed right now but it didn't had any impact on the duration.
[Edit]
The hardware shouldn't be the biggest bottleneck
EXPLAIN result
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
| 1 | SIMPLE | cal | NULL | ALL | NULL | NULL | NULL | NULL | 2009 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | m | NULL | ALL | visit | NULL | NULL | NULL | 3082466 | 100.00 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------+
Query which takes too long:
Insert into knn_data (SELECT cal.X AS X,
cal.Y AS Y,
cal.BeginTime AS BeginTime,
cal.EndTime AS EndTime,
avg(m.dbm_ant) AS avg_dbm_ant,
m.ant_id AS ant_id,
avg(m.location) avg_location,
count(*) AS count,
m.visit
FROM calibration cal
LEFT join calibration_data m
ON m.visit BETWEEN cal.BeginTime AND cal.EndTime
GROUP BY cal.X,
cal.Y,
cal.BeginTime,
cal. BeaconId,
m.ant_id,
m.macHash,
m.visit;
Table knn_data:
CREATE TABLE `knn_data` (
`X` int(11) NOT NULL,
`Y` int(11) NOT NULL,
`BeginTime` datetime NOT NULL,
`EndTIme` datetime NOT NULL,
`avg_dbm_ant` float DEFAULT NULL,
`ant_id` int(11) NOT NULL,
`avg_location` float DEFAULT NULL,
`count` int(11) DEFAULT NULL,
`visit` datetime NOT NULL,
PRIMARY KEY (`ant_id`,`visit`,`X`,`Y`,`BeginTime`,`EndTIme`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Table calibration
BeaconId, X, Y, BeginTime, EndTime
41791, 1698, 3944, 2016-11-12 22:44:00, 2016-11-12 22:49:00
CREATE TABLE `calibration` (
`BeaconId` int(11) DEFAULT NULL,
`X` int(11) DEFAULT NULL,
`Y` int(11) DEFAULT NULL,
`BeginTime` datetime DEFAULT NULL,
`EndTime` datetime DEFAULT NULL,
KEY `x,y` (`X`,`Y`),
KEY `x` (`X`),
KEY `y` (`Y`),
KEY `BID` (`BeaconId`),
KEY `beginTime` (`BeginTime`),
KEY `x,y,beg,bid` (`X`,`Y`,`BeginTime`,`BeaconId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Table calibration_data
macHash, visit, dbm_ant, ant_id, mac, isRand, posX, posY, sources, ip, dayOfMonth, location, am, ar
'f5:dc:7d:73:2d:e9', '2016-11-12 22:44:00', '-87', '381', 'f5:dc:7d:73:2d:e9', NULL, NULL, NULL, NULL, NULL, '12', '18.077636300207715', 'inradius_41791', NULL
CREATE TABLE `calibration_data` (
`macHash` varchar(100) COLLATE utf8_bin NOT NULL,
`visit` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dbm_ant` int(3) NOT NULL,
`ant_id` int(11) NOT NULL,
`mac` char(17) COLLATE utf8_bin DEFAULT NULL,
`isRand` tinyint(4) DEFAULT NULL,
`posX` double DEFAULT NULL,
`posY` double DEFAULT NULL,
`sources` int(2) DEFAULT NULL,
`ip` int(10) unsigned DEFAULT NULL,
`dayOfMonth` int(11) DEFAULT NULL,
`location` varchar(80) COLLATE utf8_bin DEFAULT NULL,
`am` varchar(300) COLLATE utf8_bin DEFAULT NULL,
`ar` varchar(300) COLLATE utf8_bin DEFAULT NULL,
KEY `visit` (`visit`),
KEY `macHash` (`macHash`),
KEY `ant, time` (`dbm_ant`,`visit`),
KEY `beacon` (`am`),
KEY `ant_id` (`ant_id`),
KEY `ant,mH,visit` (`ant_id`,`macHash`,`visit`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
That's a nasty and classical one on "range" queries: the optimiser doesnt use your indexes and end up in a full table scan. In your explain plan ou can see this on column type=ALL
.
Ideally you should have type=range
and something in the key column
Some ideas:
I doubt that changing you jointure from
ON m.visit BETWEEN cal.BeginTime AND cal.EndTime
to
ON m.visit >= cal.BeginTime AND m.visit <= cal.EndTime
will work, but still give it a try.
Do trigger an ANALYSE TABLE
on both tables. This is will update the stats on your tables and might help the optimiser to take the right decision (ie using the indexes)
Change the query to this might also help to force the optimiser use indexes :
Insert into knn_data (SELECT cal.X AS X,
cal.Y AS Y,
cal.BeginTime AS BeginTime,
cal.EndTime AS EndTime,
avg(m.dbm_ant) AS avg_dbm_ant,
m.ant_id AS ant_id,
avg(m.location) avg_location,
count(*) AS count,
m.visit
FROM calibration cal
LEFT join calibration_data m
ON m.visit >= cal.BeginTime
WHERE m.visit <= cal.EndTime
GROUP BY cal.X,
cal.Y,
cal.BeginTime,
cal. BeaconId,
m.ant_id,
m.macHash,
m.visit;
That's all I am thinking off...