If I am performing an inner join on multiple tables, how do I ensure that the result set will only contain the most recent timestamp. E.g.
SELECT
e.customer_id AS customer_id,
e.event_id AS event_id,
#MOST RECENT TIMESTAMP from car.updated_on, motorcycle.updated_on or walkc.updated_on
FROM
event_table AS e
INNER JOIN car AS c ON e.customer_id = c.customer_id
INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id
INNER JOIN walk AS w ON e.customer_id = w.customer_id
WHERE
e.event_id = c.event_id
AND e.event_id = m.event_id
AND e.event_id = w.event_id
I have a single table which records all events that occur, I only want to pull the most recent timestamp regardless of the event type for all customers that are in all three events (car, motorcycle or walk).
Sample data:
event
customer_id | event_id |
---|---|
1 | 100 |
2 | 101 |
3 | 102 |
4 | 103 |
5 | 104 |
6 | 105 |
7 | 106 |
8 | 107 |
9 | 108 |
10 | 109 |
car
customer_id | event_id | car_id | updated_on |
---|---|---|---|
1 | 100 | 1 | 2021-07-23 10:09:05 |
2 | 101 | 1 | 2021-07-23 10:09:05 |
3 | 102 | 1 | 2021-07-23 10:09:05 |
4 | 103 | 1 | 2021-07-23 10:09:05 |
5 | 104 | 1 | 2021-07-23 10:09:05 |
6 | 105 | 1 | 2021-07-23 10:09:05 |
7 | 106 | 1 | 2021-07-23 10:09:05 |
8 | 107 | 1 | 2021-07-23 10:09:05 |
9 | 108 | 1 | 2021-07-23 10:09:05 |
10 | 109 | 1 | 2021-07-23 10:09:05 |
motorcycle
customer_id | event_id | motorcycle_id | updated_on |
---|---|---|---|
1 | 100 | 1 | 2021-07-23 10:09:00 |
2 | 101 | 1 | 2021-07-23 10:09:00 |
3 | 102 | 1 | 2021-07-23 10:09:00 |
4 | 103 | 1 | 2021-07-23 10:09:00 |
5 | 104 | 1 | 2021-07-23 10:09:10 |
6 | 105 | 1 | 2021-07-23 10:09:10 |
7 | 106 | 1 | 2021-07-23 10:09:00 |
8 | 107 | 1 | 2021-07-23 10:09:00 |
walk
customer_id | event_id | walk_id | updated_on |
---|---|---|---|
1 | 100 | 1 | 2021-07-23 10:09:00 |
2 | 101 | 1 | 2021-07-23 10:09:00 |
3 | 102 | 1 | 2021-07-23 10:09:00 |
4 | 103 | 1 | 2021-07-23 10:09:00 |
5 | 104 | 1 | 2021-07-23 10:09:00 |
6 | 105 | 1 | 2021-07-23 10:09:00 |
7 | 106 | 1 | 2021-07-23 10:09:00 |
8 | 107 | 1 | 2021-07-23 10:09:15 |
9 | 108 | 1 | 2021-07-23 10:09:15 |
Desired Result:
customer_id | event_id | updated_on | comment |
---|---|---|---|
1 | 100 | 2021-07-23 10:09:05 | TS from car |
2 | 101 | 2021-07-23 10:09:05 | TS from car |
3 | 102 | 2021-07-23 10:09:05 | TS from car |
4 | 103 | 2021-07-23 10:09:05 | TS from car |
5 | 104 | 2021-07-23 10:09:10 | TS from motorcycle |
6 | 105 | 2021-07-23 10:09:10 | TS from motorcycle |
7 | 106 | 2021-07-23 10:09:15 | TS from walk |
8 | 107 | 2021-07-23 10:09:15 | TS from walk |
I do not need the comment
in the final result set, i added it for explanation purposes only. Effectively, i don't care what the event was. All i care about is the INNER JOIN
of the four tables; so there should only be 8 records at most and I only want the timestamp value that is the most recent (highest). The customer_id
and event_id
need to match for all the INNER JOINS
.
E.g: customer_id = 1
and event_id = 100
; this exists in all 4 tables. It has three values for updated_on
(from each: car, motorcycle and walk). I want the MAX(2021-07-23 10:09:05, 2021-07-23 10:09:00, 2021-07-23 10:09:00)
; MAX(car.updated_on, motorcyle.updated_on, walk.updated_on)
.
Any help would be appreciated, thanks.
EDIT: I got the desired result working in two queries. Hoping to optimize into a single query.
UNIQUE
records between the three tables and store them in another location called event_joined
. This table is fully overwritten each time and not just appended to.SELECT
e.customer_id AS customer_id,
e.event_id AS event_id,
FROM
event_table AS e
INNER JOIN car AS c ON e.customer_id = c.customer_id
INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id
INNER JOIN walk AS w ON e.customer_id = w.customer_id
WHERE
e.event_id = c.event_id
AND e.event_id = m.event_id
AND e.event_id = w.event_id
UNION
, we know that all three tables will have the same row count because we previously joined them for matching records only. Now we simply GROUP BY
and get the MAX
(most recent) timestamp.SELECT event_temp.customer_id, event_temp.event_id, MAX(event_temp.updated_on) AS updated_on
FROM (
SELECT customer_id, event_id, updated_on FROM car AS c INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
UNION ALL
SELECT customer_id, event_id, updated_on FROM motorcycle AS m INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
UNION ALL
SELECT customer_id, event_id, updated_on FROM walk AS w INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
) AS event_temp
GROUP BY event_temp.customer_id, event_temp.event_id;
Is there any way to optimize this into a single query? Thanks.
You can go for simple query with CROSS APPLY as given below:
SELECT
e.customer_id AS customer_id,
e.event_id AS event_id,
max(t.updated_On)
FROM
event_table AS e
INNER JOIN car AS c ON e.customer_id = c.customer_id and e.event_id = c.event_id
INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id and e.event_id = m.event_id
INNER JOIN walk AS w ON e.customer_id = w.customer_id and e.event_id = w.event_id
CROSS APPLY (values (c.updated_On),(m.updated_On),(w.updated_On)) as t(updated_On)
GROUP BY e.customer_id,
e.event_id
Sample data and working solution
declare @event table(cust_id int, event_id int)
declare @car table(cust_id int, event_id int, updated_on datetime)
declare @walk table(cust_id int, event_id int, updated_on datetime)
insert into @event values (1, 100)
insert into @car values (1,100, '2020-01-01')
insert into @walk values(1,100, '2020-02-01')
SELECT
e.cust_id AS customer_id,
e.event_id AS event_id,
max(t.updatedON) as recent_timestamp
FROM
@event AS e
INNER JOIN @car AS c ON e.cust_id = c.cust_id and e.event_id = c.event_id
INNER JOIN @walk AS w ON e.cust_id = w.cust_id and e.event_id = w.event_id
CROSS APPLY (values(c.updated_On),(w.updated_on)) as t(updatedOn)
group by e.cust_id, e.event_id
customer_id | event_id | recent_timestamp |
---|---|---|
1 | 100 | 2020-02-01 00:00:00.000 |