I'm using MySQL tables.
Sample table EMP
(primary_key
on (EID, DOJ)
) :
ENAME | EID | DEPT | SAL | DOJ (YYYY-MM-DD) |
---|---|---|---|---|
A | 6 | ee | 2000 | 2021-03-01 |
B | 5 | me | 2020 | 2021-04-30 |
C | 3 | it | 2000 | 2020-12-27 |
D | 4 | cv | 2020 | 2020-10-31 |
E | 1 | it | 2000 | 2021-01-01 |
F | 2 | it | null | 2021-02-28 |
G | 7 | ee | null | 2020-11-20 |
I have one job that deletes and inserts data into a similarly structured table.
The process needs to delete non-monthly data. That is, it will delete rows with EID
6, 3, 1, 7
: These rows have DOJ
that is not month-end.
The query I came up with:
-- DOJ is of type DATE
DELETE FROM EMP WHERE LAST_DAY(DOJ) <> DOJ ;
It works as expected. But, because there is large data (~5 million) this is slow.
I understand that because of LAST_DAY()
function, I'm loosing the index
advantage of DOJ
.
Can you please suggest how can I improve the query?
You are right about the index. It is no use here. The only idea that comes to mind is a generated column to tell you about the date being the month's end or not. Thus you'd have a column you can index and use in your query:
create table emp
(
ename varchar(100),
...
doj date,
is_month_end bool as (doj = last_day(doj))
);
create index idx_month_ends on emp (is_month_end);
delete from emp where not is_month_end;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=397388b70bb1f459bbefce630ad27ac4
An index can only help, though, if this is about a very small part of the data in the table, say 1%. With many more rows to delete it makes more sense to read the whole table.