We have 2 tables that contain a global_id
to join on. Let's call them device_updates
and client_updates
. Their respective tables (albeit reduced to help solve the problem) look like the following:
device_updates
update_id | global_id | created | device_date |
---|---|---|---|
Integer | Integer | Datetime | Datetime |
client_updates
client_update_id | global_id | updated | updated_by | update_type |
---|---|---|---|---|
Integer | Integer | Datetime | Integer | String |
*Note that the real table has an Integer representing the update_type. this is just for clarity here.
When the device changes state, it always updates the device_updates
table. If this is an alarm state, then the client is notified and can choose to 'update' that alarm to be silenced (as an example).
What we need to be able to do is generate a history of all events from the device updates and client updates in chronological order. But since neither table is aware of each other (other than the global_id
) I can't figure out how to ensure that each returned row contains the exact state at that point in time. Also, since there is a possibility of having a few thousand devices update each second, and a client can silence all with a single click, we can't add a column in the client updates table that references the device table row as it's both slow, and we were also planning to do bulk uploads to handle the speed requirements which means we can't assign a matched id.
As an example flow:
What should be returned in an 'event_history' query is the following:
update_id | global_id | created | device_date | client_update_id | updated | updated_by | update_type |
---|---|---|---|---|---|---|---|
1 | 15 | 10:53:24 | 10:53:23 | Null | Null | Null | Null |
2 | 15 | 10:53:24 | 10:53:23 | 1 | 10:53:34 | 7 | "Silence" |
3 | 15 | 10:57:09 | 10:57:8 | Null | Null | Null | Null |
To sum up, how do I join the device_updates
table to the client_updates
table, but note that device updates won't necessarily have an associated client update, but all client updates will need to find the exact device update that matches the updated time (noting that the update always occurs after the device_date). Also note that there are potentially many thousands of devices, so it's not just for a single global_id
.
UPDATE I forgot to add an example of a query I have tried, which is below:
db.query(
client_updates,
device_updates
).outerjoin(
device_updates, device_updates.device_update_id==db.query(
device_updates
).filter(
device_updates.global_id==client_updates.global_id,
device_updates.device_date<client_updates.updated
).order_by(
device_updates.updated.desc()
).first().device_update_id
)
But this appears to find the same device_update_id
to match everything with.
Ended up going with the Kafka solution. Super speedy results, as my test build was comfortably able to handle a very high throughput and I now have confidence in that being robust and fast.