I've been puzzling over this for some time now and realise its time to ask for some help. As part of an integration with a third party system, I have introduced an additional table to an existing system to manage synchronization of records. I'll shrink down the table structures to just include sufficient details to present the problem:
Table 1: data
Columns: (int)data_id*, (varchar)name, (datetime)date_created
Table 2: sync
Columns: (int)sync_id*, (int)data_id, (int)result, (varchar)details,
(datetime)date_created
* denotes primary index
When the third party system attempts to synchronize records, it uses an HTTP GET request and a server-side script returns an XML response containing the record information which is waiting to be synchronized for the first time (no sync records will exist for this data_id), and also record information which is awaiting a retry at synchronization having failed on a previous attempt (the most recent sync record for this data_id will have a result value of 0). The third party system then uses an HTTP POST request to a different server-side script which provides feedback as to which records it was able to match and successfully synchronize (result = 1
) and which records it was not able to match and synchronize with (result = 0, details = "Error Message"
).
There will frequently be multiple sync records for each data record since there are a number of valid scenarios which may prevent a successful synchronization without a user first taking some action.
It is important to keep each synchronization attempt logged and therefore to just add a sync column to the data table is not acceptable.
The pseudo-code version of what I'm trying to get working with SQL is along these lines:
The closest working SQL statement I have so far uses a LEFT JOIN
:
SELECT d.data_id, d.name, d.date_created, s.sync_id, s.result, s.details
FROM (
SELECT data_id, name, date_created
FROM data
) AS d
LEFT JOIN (
SELECT sync_id, data_id, result, details, date_created
FROM sync
GROUP BY data_id
) AS s
ON d.data_id = s.data_id
ORDER BY d.date_created DESC;
Unfortunately this does not seem to take the most recent sync record but the GROUP BY data_id
seems to simply grab the first sync record it finds. MySQL syntax won't allow me to place an ORDER BY date_created DESC
before the GROUP BY data_id
line. If I place this ORDER BY statement after the GROUP BY line it seems to take no effect, and the most recent sync record is not the one shown alongside the data columns in the results.
A simpler version I started off with that has the exact same problem of not necessarily taking the most recent sync record:
SELECT d.data_id, d.name, d.date_created, s.result, s.details
FROM data AS d LEFT JOIN sync AS s ON d.data_id = s.data_id
WHERE s.result = 0 OR s.result IS NULL;
I've also tried using a sub-query to achieve this, again same problem:
SELECT d.data_id, d.name, d.date_created, s.sync_id, s.result, s.details
FROM (
SELECT data_id, name, date_created
FROM data
) AS d, (
SELECT s.sync_id, s.data_id, s.result, s.details, s.date_created
FROM sync AS s, data AS d
WHERE s.data_id = d.data_id
ORDER BY s.date_created DESC
) AS s
WHERE d.data_id = s.data_id
ORDER BY s.date_created DESC;
Please can someone advise how I can be certain to get only the most recent sync record alongside the data records in a single query. I'm happy for the solution to involve any combination of joins or sub-queries as required. Thank you.
You just need to use order by for sync table in your query then group by as mysql does not support order by before group by because mysql uses first group by then order by.
So you can use below query.
SELECT d.data_id, d.name, d.date_created, s.sync_id, s.result, s.details
FROM `data` AS d LEFT JOIN (
SELECT sync_id, data_id, result, details, date_created FROM (
SELECT sync_id, data_id, result, details, date_created
FROM sync
ORDER BY date_created DESC
) a GROUP BY a.data_id
) s
ON d.data_id = s.data_id
ORDER BY d.date_created DESC;