Search code examples
mysqlgroupwise-maximum

SQL - obtain latest record in subquery subset or in join using GROUP BY


I've been puzzling over this for some time now and realise its time to ask for some help. As part of an integration with a third party system, I have introduced an additional table to an existing system to manage synchronization of records. I'll shrink down the table structures to just include sufficient details to present the problem:

Table 1: data
Columns: (int)data_id*, (varchar)name, (datetime)date_created

Table 2: sync
Columns: (int)sync_id*, (int)data_id, (int)result, (varchar)details,
         (datetime)date_created

* denotes primary index

When the third party system attempts to synchronize records, it uses an HTTP GET request and a server-side script returns an XML response containing the record information which is waiting to be synchronized for the first time (no sync records will exist for this data_id), and also record information which is awaiting a retry at synchronization having failed on a previous attempt (the most recent sync record for this data_id will have a result value of 0). The third party system then uses an HTTP POST request to a different server-side script which provides feedback as to which records it was able to match and successfully synchronize (result = 1) and which records it was not able to match and synchronize with (result = 0, details = "Error Message").

There will frequently be multiple sync records for each data record since there are a number of valid scenarios which may prevent a successful synchronization without a user first taking some action.

It is important to keep each synchronization attempt logged and therefore to just add a sync column to the data table is not acceptable.

The pseudo-code version of what I'm trying to get working with SQL is along these lines:

  1. Fetch all the records from the data table and alongside each record.
  2. And for each of those records find the latest sync record by matching the data_id, ordering the sync records in descending order (newest at the top), and limiting the sync records to just 1 (we only need the most recent sync record for this query).
  3. Show the columns from both the data and latest sync record. If no sync record exists, the data record should still be shown and the sync columns simply populated with NULL values.

The closest working SQL statement I have so far uses a LEFT JOIN:

SELECT d.data_id, d.name, d.date_created, s.sync_id, s.result, s.details
FROM (
    SELECT data_id, name, date_created
    FROM data
) AS d 
LEFT JOIN (
    SELECT sync_id, data_id, result, details, date_created
    FROM sync 
    GROUP BY data_id
) AS s 
ON d.data_id = s.data_id 
ORDER BY d.date_created DESC;

Unfortunately this does not seem to take the most recent sync record but the GROUP BY data_id seems to simply grab the first sync record it finds. MySQL syntax won't allow me to place an ORDER BY date_created DESC before the GROUP BY data_id line. If I place this ORDER BY statement after the GROUP BY line it seems to take no effect, and the most recent sync record is not the one shown alongside the data columns in the results.

A simpler version I started off with that has the exact same problem of not necessarily taking the most recent sync record:

SELECT d.data_id, d.name, d.date_created, s.result, s.details 
FROM data AS d LEFT JOIN sync AS s ON d.data_id = s.data_id 
WHERE s.result = 0 OR s.result IS NULL;

I've also tried using a sub-query to achieve this, again same problem:

SELECT d.data_id, d.name, d.date_created, s.sync_id, s.result, s.details
FROM (
    SELECT data_id, name, date_created
    FROM data
) AS d, (
    SELECT s.sync_id, s.data_id, s.result, s.details, s.date_created
    FROM sync AS s, data AS d
    WHERE s.data_id = d.data_id
    ORDER BY s.date_created DESC
) AS s 
WHERE d.data_id = s.data_id 
ORDER BY s.date_created DESC;

Please can someone advise how I can be certain to get only the most recent sync record alongside the data records in a single query. I'm happy for the solution to involve any combination of joins or sub-queries as required. Thank you.


Solution

  • You just need to use order by for sync table in your query then group by as mysql does not support order by before group by because mysql uses first group by then order by.

    So you can use below query.

    SELECT d.data_id, d.name, d.date_created, s.sync_id, s.result, s.details
    FROM `data` AS d LEFT JOIN (
      SELECT sync_id, data_id, result, details, date_created FROM (
        SELECT sync_id, data_id, result, details, date_created
        FROM sync
        ORDER BY date_created DESC
      ) a GROUP BY a.data_id
    ) s
    ON d.data_id = s.data_id 
    ORDER BY d.date_created DESC;