Search code examples
sqlsql-servergreatest-n-per-groupsql-server-2016

How to select only the most recent timestamp?


If I am performing an inner join on multiple tables, how do I ensure that the result set will only contain the most recent timestamp. E.g.

SELECT
  e.customer_id AS customer_id,
  e.event_id AS event_id,
  #MOST RECENT TIMESTAMP from car.updated_on, motorcycle.updated_on or walkc.updated_on
FROM
  event_table AS e
  INNER JOIN car AS c ON e.customer_id = c.customer_id
  INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id
  INNER JOIN walk AS w ON e.customer_id = w.customer_id
WHERE
  e.event_id = c.event_id
  AND e.event_id = m.event_id
  AND e.event_id = w.event_id

I have a single table which records all events that occur, I only want to pull the most recent timestamp regardless of the event type for all customers that are in all three events (car, motorcycle or walk).

Sample data:

event

customer_id event_id
1 100
2 101
3 102
4 103
5 104
6 105
7 106
8 107
9 108
10 109

car

customer_id event_id car_id updated_on
1 100 1 2021-07-23 10:09:05
2 101 1 2021-07-23 10:09:05
3 102 1 2021-07-23 10:09:05
4 103 1 2021-07-23 10:09:05
5 104 1 2021-07-23 10:09:05
6 105 1 2021-07-23 10:09:05
7 106 1 2021-07-23 10:09:05
8 107 1 2021-07-23 10:09:05
9 108 1 2021-07-23 10:09:05
10 109 1 2021-07-23 10:09:05

motorcycle

customer_id event_id motorcycle_id updated_on
1 100 1 2021-07-23 10:09:00
2 101 1 2021-07-23 10:09:00
3 102 1 2021-07-23 10:09:00
4 103 1 2021-07-23 10:09:00
5 104 1 2021-07-23 10:09:10
6 105 1 2021-07-23 10:09:10
7 106 1 2021-07-23 10:09:00
8 107 1 2021-07-23 10:09:00

walk

customer_id event_id walk_id updated_on
1 100 1 2021-07-23 10:09:00
2 101 1 2021-07-23 10:09:00
3 102 1 2021-07-23 10:09:00
4 103 1 2021-07-23 10:09:00
5 104 1 2021-07-23 10:09:00
6 105 1 2021-07-23 10:09:00
7 106 1 2021-07-23 10:09:00
8 107 1 2021-07-23 10:09:15
9 108 1 2021-07-23 10:09:15

Desired Result:

customer_id event_id updated_on comment
1 100 2021-07-23 10:09:05 TS from car
2 101 2021-07-23 10:09:05 TS from car
3 102 2021-07-23 10:09:05 TS from car
4 103 2021-07-23 10:09:05 TS from car
5 104 2021-07-23 10:09:10 TS from motorcycle
6 105 2021-07-23 10:09:10 TS from motorcycle
7 106 2021-07-23 10:09:15 TS from walk
8 107 2021-07-23 10:09:15 TS from walk

I do not need the comment in the final result set, i added it for explanation purposes only. Effectively, i don't care what the event was. All i care about is the INNER JOIN of the four tables; so there should only be 8 records at most and I only want the timestamp value that is the most recent (highest). The customer_id and event_id need to match for all the INNER JOINS.

E.g: customer_id = 1 and event_id = 100; this exists in all 4 tables. It has three values for updated_on (from each: car, motorcycle and walk). I want the MAX(2021-07-23 10:09:05, 2021-07-23 10:09:00, 2021-07-23 10:09:00); MAX(car.updated_on, motorcyle.updated_on, walk.updated_on).

Any help would be appreciated, thanks.

EDIT: I got the desired result working in two queries. Hoping to optimize into a single query.

  1. Get only the UNIQUE records between the three tables and store them in another location called event_joined. This table is fully overwritten each time and not just appended to.
SELECT
  e.customer_id AS customer_id,
  e.event_id AS event_id,
FROM
  event_table AS e
  INNER JOIN car AS c ON e.customer_id = c.customer_id
  INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id
  INNER JOIN walk AS w ON e.customer_id = w.customer_id
WHERE
  e.event_id = c.event_id
  AND e.event_id = m.event_id
  AND e.event_id = w.event_id
  1. Prior to doing a UNION, we know that all three tables will have the same row count because we previously joined them for matching records only. Now we simply GROUP BY and get the MAX (most recent) timestamp.
SELECT event_temp.customer_id, event_temp.event_id, MAX(event_temp.updated_on) AS updated_on
FROM (
  SELECT customer_id, event_id, updated_on FROM car AS c INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
  UNION ALL
  SELECT customer_id, event_id, updated_on FROM motorcycle AS m INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
  UNION ALL 
  SELECT customer_id, event_id, updated_on FROM walk AS w INNER JOIN event_joined AS ej ON e.customer_id = c.customer_id AND e.event_id = c.event_id
) AS event_temp
GROUP BY event_temp.customer_id, event_temp.event_id;

Is there any way to optimize this into a single query? Thanks.


Solution

  • You can go for simple query with CROSS APPLY as given below:

    SELECT
      e.customer_id AS customer_id,
      e.event_id AS event_id,
      max(t.updated_On)
    FROM
      event_table AS e
      INNER JOIN car AS c ON e.customer_id = c.customer_id and e.event_id = c.event_id
      INNER JOIN motorcycle AS m ON e.customer_id = m.customer_id and e.event_id = m.event_id
      INNER JOIN walk AS w ON e.customer_id = w.customer_id and e.event_id = w.event_id
      CROSS APPLY (values (c.updated_On),(m.updated_On),(w.updated_On)) as t(updated_On)
    GROUP BY e.customer_id,
      e.event_id
    

    Sample data and working solution

    declare @event table(cust_id int, event_id int)
    declare @car table(cust_id int, event_id int, updated_on datetime)
    declare @walk table(cust_id int, event_id int, updated_on datetime)
    
    insert into @event values (1, 100)
    insert into @car values (1,100, '2020-01-01')
    insert into @walk values(1,100, '2020-02-01')
    
    SELECT
      e.cust_id AS customer_id,
      e.event_id AS event_id,
      max(t.updatedON)  as recent_timestamp
    FROM
      @event AS e
      INNER JOIN @car AS c ON e.cust_id = c.cust_id and e.event_id = c.event_id
      INNER JOIN @walk AS w ON e.cust_id = w.cust_id and e.event_id = w.event_id
      CROSS APPLY (values(c.updated_On),(w.updated_on)) as t(updatedOn)
      group by e.cust_id, e.event_id
    
    customer_id event_id recent_timestamp
    1 100 2020-02-01 00:00:00.000