Search code examples
mysqlmysql-slow-query-log

NOT IN subquery takes too long to execute


How do I optimize this query to get the same result, without taking as long? The NOT IN subquery takes a very long time.

SELECT DISTINCT EmployeeId FROM employees
    WHERE 
    status = 'Active' 
    && BranchId = '2' 
    && NOT EXISTS (
      SELECT * FROM attendance
      WHERE
      employees.EmployeeId = attendance.EmployeeId 
      && attendance.AttendanceDate = '2015-01-20'
    )
  )

SELECT EmployeeId FROM employees 
    WHERE 
    status = 'Active' 
    && BranchId = '2' 
    && NOT IN (
      SELECT EmployeeId FROM attendance WHERE AttendanceDate='2015-01-20'
    )

Solution

  • Here is another version of your query

    select
    distinct e.EmployeeId FROM employees e
    left join attendance a on e.EmployeeId = a.EmployeeId and a.AttendanceDate = '2015-01-20'
    where
    e.status='Active' 
    and e.BranchId= '2' 
    and a.EmployeeId is null
    

    You will also need some indexes to be applied on the tables as

    alter table employees add index st_br_idx(status,BranchId);
    alter table AttendanceDate add index AttendanceDate_idx(AttendanceDate);
    

    if EmployeeId is a foreign key then no need to add the index else if the index is not already there you may need the following as well

    alter table AttendanceDate add index EmployeeId_idx(EmployeeId);
    

    If EmployeeId is a primary key in employees then its already indexed if not and not indexed you may need to add index for that as well

    alter table employees add index EmployeeId_idx(EmployeeId);
    

    You may also check your original query after having above indexes

    SELECT DISTINCT e.EmployeeId FROM employees e 
    WHERE 
    e.status='Active' 
    and e.BranchId= '2' 
    and NOT EXISTS (
      SELECT 1 FROM 
      attendance a WHERE e.EmployeeId = a.EmployeeId 
      and a.AttendanceDate='2015-01-20'
    )
    

    To analyze the query use explain select.. and see how optimizer is using the indexes and possible number of rows the optimizer may scan for retrieving the records