I'm working with enterprise data that looks like this.
| load_number | id | time | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB |
| 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB |
| 1442 | 570732257 | 00:20:00 | 00:20:00 | FX-4 | SBOB |
| 1442 | 570732257 | 00:20:00 | 00:20:30 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
| 1442 | 570732258 | 00:55:00 | 00:55:00 | FX-4 | SBOB |
| 1442 | 570732258 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
The issue is that the company has bad data practices and changes/reuses IDs, but only updates the load_number
field.
How do I construct my sql query to pull the latest loaded data like so:
| load_number | id | time | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB |
| 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB |
| 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
Basically every field will match except the id
and the load_number
. So given that every field matches except those two fields, can I remove 'duplicates' by taking the row with the higher load_number
.
I was thinking of some sort of descending rank()
on the load_number
,
Any help is much appreciated!
Try something like this
with max_load_numbers_by_id AS (
SELECT et.id, MAX(et.load_number) AS max_load_number
FROM enterprise_table et
GROUP BY et.id
)
SELECT et.*
FROM enterprise_table et
JOIN max_load_numbers_by_id mlnbi
ON et.id = mlnbi.id
AND et.max_load_number = mlnbi.load_number