Search code examples
mysqljoinleft-join

Mysql query to find rows based on other column value using left join


I have a MySQL table temp_employee as follows:

CREATE TABLE `temp_employee` (
  `employee_id` varchar(22) COLLATE utf8mb4_unicode_ci NOT NULL,
  `application_date` datetime NOT NULL,
  PRIMARY KEY (`employee_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

And data in the table as follows:

employee_id application_date
emp.10062808068 2023-03-30 07:09:31
emp.10036984442 2023-03-30 09:48:30
emp.10041499192 2023-03-30 09:58:55
emp.10012314367 2023-03-30 03:33:28

Insert query for temp_employee table:

INSERT INTO `temp_employee` (`employee_id`, `application_date`)
VALUES
    ('emp.10012314367', '2023-03-30 03:33:28'),
    ('emp.10036984442', '2023-03-30 09:48:30'),
    ('emp.10041499192', '2023-03-30 09:58:55'),
    ('emp.10062808068', '2023-03-30 07:09:31');

There is another table employee_logs as follows:

CREATE TABLE `employee_logs` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `employee_id` varchar(22) COLLATE utf8mb4_unicode_ci NOT NULL,
  `type` tinyint(1) NOT NULL,
  `status` tinyint(1) NOT NULL,
  `updated_time` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

The data in the employee_logs table as follows:

id employee_id type status updated_time
1 emp.10000000001 1 2 2021-04-30 10:19:31
2 emp.10000000002 2 2 2022-06-30 17:29:31
3 emp.10062808068 1 2 2023-03-30 11:19:20
4 emp.10000000003 2 3 2023-03-30 03:03:30
5 emp.10000000004 1 2 2023-03-30 04:04:40
6 emp.10012314367 2 1 2023-03-30 13:13:31
7 emp.10062808068 2 3 2023-03-20 17:24:10
8 emp.10036984442 1 2 2023-03-30 09:09:09
9 emp.10012314367 2 1 2023-03-20 10:10:10
10 emp.10012314367 2 2 2023-03-21 12:12:12

Insert query for employee_logs table:

INSERT INTO `employee_logs` (`id`, `employee_id`, `type`, `status`, `updated_time`)
VALUES
    (1, 'emp.10000000001', 1, 2, '2021-04-30 10:19:31'),
    (2, 'emp.10000000002', 2, 2, '2022-06-30 17:29:31'),
    (3, 'emp.10062808068', 1, 2, '2023-03-30 11:19:20'),
    (4, 'emp.10000000003', 2, 3, '2023-03-30 03:03:30'),
    (5, 'emp.10000000004', 1, 2, '2023-03-30 04:04:40'),
    (6, 'emp.10012314367', 2, 1, '2023-03-30 13:13:31'),
    (7, 'emp.10062808068', 2, 3, '2023-03-20 17:24:10'),
    (8, 'emp.10036984442', 1, 2, '2023-03-30 09:09:09'),
    (9, 'emp.10012314367', 2, 1, '2023-03-20 10:10:10'),
    (10, 'emp.10012314367', 2, 2, '2023-03-21 12:12:12');

I want Mysql query to get type, status & updated_time for each employee_id in temp_employee table which select single row from employee_logs table with condition status != 3 AND updated_time <= application_date.

Expected Result (No. of rows same as the count of temp_employee table)

employee_id type status updated_time
emp.10062808068 NULL NULL NULL
emp.10036984442 1 2 2023-03-30 09:09:09
emp.10041499192 NULL NULL NULL
emp.10012314367 2 2 2023-03-21 12:12:12

employee_logs table is huge with more than 40 million rows and can have multiple rows with the same employee_id.

Explanation on the employee_logs table which rows should picked by the condition (status <> 3 AND updated_time <= application_date), as follows:

1   emp.10000000001    1       2      2021-04-30 10:19:31 -- should not as emp_id not exists in temp table 
2   emp.10000000002    2       2      2022-06-30 17:29:31 -- should not as emp_id not exists in temp table
3   emp.10062808068    1       2      2023-03-30 11:19:20 -- should not return as updated_time > application_date 2023-03-30 07:09:31
4   emp.10000000003    2       3      2023-03-30 03:03:30 -- should not as emp_id not exists in temp table
5   emp.10000000004    1       2      2023-03-30 04:04:40 -- should not as emp_id not exists in temp table
6   emp.10012314367    2       1      2023-03-30 13:13:31 -- should not return as updated_time > application_date 2023-03-30 03:33:28
7   emp.10062808068    2       3      2023-03-20 17:24:10 -- should not return as status = 3
8   emp.10036984442    1       2      2023-03-30 09:09:09 -- should return as updated_time <= application_date && status != 3 
9   emp.10012314367    2       1      2023-03-20 10:10:10 -- should not return as updated_time < application_date 2023-03-30 03:33:28 but not latest
10  emp.10012314367    2       2      2023-03-21 12:12:12 -- should return as updated_time < application_date 2023-03-30 03:33:28

Any help in the query to achieve the desired output will be highly appreciated. TIA


Solution

  • The following query worked for me, the application_date column value in the temp_employee table was incorrect, after the fix the query return expected result.

    SELECT t.employee_id, oel.type, oel.status, oel.updated_time
    FROM (
        SELECT t.employee_id, MAX(el.updated_time) updated_time
        FROM temp_employee t LEFT JOIN employee_logs el
            ON t.employee_id = el.employee_id 
            AND el.status <> 3 
            AND el.updated_time <= t.application_date
        GROUP BY t.employee_id
    ) t LEFT JOIN employee_logs oel
    ON t.employee_id = oel.employee_id AND t.updated_time = oel.updated_time;
    

    SQL Fiddle link solution