Search code examples
mysqlsqljoingroup-bymaxdate

Can I get the max date of a group without doing a self join?


I have this monstrous query here:

SELECT t.technician_id AS techID, CONCAT(t.first_name, ' ', t.last_name) AS technicianName, h.date, p.name
FROM technician t
JOIN technician_date td ON t.technician_id = td.technician_id
JOIN hours_input_on_date h ON h.id = td.hours_input_on_date_id
JOIN technician_date_project tdp ON tdp.technician_date_id = td.technician_date_id
JOIN projects p ON p.id = tdp.project_id
JOIN(
    SELECT h.date, p.name
    FROM technician t
    JOIN technician_date td ON t.technician_id = td.technician_id
    JOIN hours_input_on_date h ON h.id = td.hours_input_on_date_id
    JOIN technician_date_project tdp ON tdp.technician_date_id = td.technician_date_id
    JOIN projects p ON p.id = tdp.project_id
    GROUP BY h.date, p.name
    HAVING COUNT(*) > 1) temp ON h.date = temp.date AND p.name = temp.name
JOIN(
    SELECT MAX(date) AS latestDate, name
    FROM(
        SELECT h.date, p.name
        FROM technician t
        JOIN technician_date td ON t.technician_id = td.technician_id
        JOIN hours_input_on_date h ON h.id = td.hours_input_on_date_id
        JOIN technician_date_project tdp ON tdp.technician_date_id = td.technician_date_id
        JOIN projects p ON p.id = tdp.project_id
        GROUP BY h.date, p.name
        HAVING COUNT(*) > 1) t
    GROUP BY t.name) temp1 ON temp1.latestDate = temp.date AND temp1.name = temp.name;

What this is doing is getting the technicians and projects for cases when more than one technician worked on the same project on the same day. See my previous question for clarification. This is what the first group of joins, and the first join subquery accomplish.

Now, I would just like to get the latest group of techs who worked on a project on the same day. I've added another self join that gets the max date from the group of projects where multiple techs worked on the same day. Is there a more efficient way to remove that last JOIN block and still limit the grouping to the latest date, or is a self-join the only way to go?

For more clarification, let's say I took the last JOIN block out, I would see something like this:

| tech |    date    | project |
+------+------------+---------+
| Adam | 2014-12-04 |  Math   |
| John | 2014-12-04 |  Math   |
| Jane | 2014-12-04 |  Math   |
| Adam | 2014-12-05 |  Math   |
| John | 2014-12-05 |  Math   |

I would like to only see the last two rows, because it was the most recent group for that project, but can I do it more efficiently than joining all of my tables again?

EDIT

Here is an SQL Fiddle that helps demonstrate the differences.


Solution

  • you could nest a "max(date) group by name" in your subquery:

    SELECT t.technician_id AS techID, CONCAT(t.first_name, ' ', t.last_name) AS technicianName, h.date, p.name
    FROM technician t
    JOIN technician_date td ON t.technician_id = td.technician_id
    JOIN hours_input_on_date h ON h.id = td.hours_input_on_date_id
    JOIN technician_date_project tdp ON tdp.technician_date_id = td.technician_date_id
    JOIN projects p ON p.id = tdp.project_id
    JOIN(
    
        SELECT max(date) date, name from (
    
        SELECT h.date, p.name
        FROM technician t
        JOIN technician_date td ON t.technician_id = td.technician_id
        JOIN hours_input_on_date h ON h.id = td.hours_input_on_date_id
        JOIN technician_date_project tdp ON tdp.technician_date_id = td.technician_date_id
        JOIN projects p ON p.id = tdp.project_id
        GROUP BY h.date, p.name
        HAVING COUNT(*) > 1
    
        ) subtemp
        group by name
    
    ) temp ON h.date = temp.date AND p.name = temp.name;