I am trying to optimize the sql query on a large event table (10 million+ rows) for date range search. I already have unique index on this table which (lid, did, measurement, date).The query below is trying to get the event of three type of measurement (Kilowatts, Current and voltage) for every 2 second interval in date column :
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Voltage")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Current")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Kilowatts")
group by timekey
This is the table that I am trying to look up to.
=============================================================
id | lid | did | measurement | date
=============================================================
1 | 1 | 1 | Kilowatts | 2020-04-27 00:00:00
=============================================================
2 | 1 | 1 | Current | 2020-04-27 00:00:00
=============================================================
3 | 1 | 1 | Voltage | 2020-04-27 00:00:00
=============================================================
4 | 1 | 1 | Kilowatts | 2020-04-27 00:00:01
=============================================================
5 | 1 | 1 | Current | 2020-04-27 00:00:01
=============================================================
6 | 1 | 1 | Voltage | 2020-04-27 00:00:01
=============================================================
7 | 1 | 1 | Kilowatts | 2020-04-27 00:00:02
=============================================================
8 | 1 | 1 | Current | 2020-04-27 00:00:02
=============================================================
9 | 1 | 1 | Voltage | 2020-04-27 00:00:02
The expected result is retrieve all data that have the date equal to 2020-04-27 00:00:00 and 2020-04-27 00:00:02. The query provided above work as expected. But I am using UNION for look up different measurements on the table, I believe it might not be the optimal way to do it.
Can any SQL expert help me to tone the query that I have to increase the performance?
You have one record every second for each and every measurement, and you want to select one record every two seconds.
You could try:
select *
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
and extract(second from date) % 2 = 0
This would select records that have an even second part.
Alternatively, if you always have one record every second, another option is row_number()
(this requires MySQL 8.0):
select *
from (
select
e.*,
row_number() over(partition by measurement order by date) rn
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1
This is a bit less accurate than the previous query though.