Search code examples
sqlamazon-web-servicesamazon-athena

SQL query to summarize Audit table into new Aggregate Table


I have a ticket_audit table that logs for each ticket_id logs the status of the ticket, and the duration of each status.
Each ticket_id could have up to three statuses, (waiting, respond, and validation).

I want to be able to write a query that summarizes the information like in the desired output table below.

desired output columns and values

  1. datetime - first datetimestamp of each ticket_id

  2. agent

  3. ticket_id

  4. waiting_duration - for each ticket_id, there could possible be 2 or more duration for status = 'waiting'. I would like the duration to be summed together.

  5. respond_duration - Not all ticket_id 's will have a respond duration

  6. validation_duration - Not all ticket_id 's will have a validation duration so

    ticket_audit table

datetime agent ticket_id status duration
9/1/24 5:47:43 PM john 1001 waiting 1
9/1/24 5:47:48 PM john 1001 waiting 7
9/1/24 5:47:53 PM john 1001 respond 11
9/1/24 5:47:56 PM john 1001 validation 15
9/1/24 5:47:48 PM mary 1002 waiting 3
9/1/24 5:48:53 PM mary 1002 waiting 4
9/1/24 5:48:56 PM mary 1002 respond 45
9/1/24 5:49:56 PM mary 1002 validation 53
9/2/24 12:04:39 AM dave 1003 waiting 10
9/2/24 12:04:48 AM dave 1003 respond 39
9/2/24 12:06:48 AM dave 1003 validation 54
9/2/24 12:04:39 AM john 1004 waiting 10
9/2/24 12:05:39 AM john 1004 waiting 8
9/2/24 12:04:39 AM mary 1005 waiting 3
9/2/24 12:04:48 AM mary 1005 waiting 8
9/2/24 12:06:48 AM mary 1005 respond 15

desired output

datetime agent ticket_id waiting_duration respond_duration validation_duration
9/1/24 5:47:43 PM john 1001 8 11 15
9/1/24 5:47:48 PM mary 1002 7 45 53
9/2/24 12:04:39 AM dave 1003 10 39 54
9/2/24 12:04:39 AM john 1004 18 0 0
9/2/24 12:04:39 AM mary 1005 11 15 0

Any help is appreciated. Thanks


Solution

  • Assumptions:

    • You want the earliest datetime
    • There is only a single agent per ticket_id
    • You want one row per ticket_id
    • The _duration columns are sums of the duration column (not math calculating time between statuses)

    You could simply use:

    select
      min(datetime) as datetime,
      min(agent) as agent,
      ticket_id,
      sum(case when status = 'waiting' then duration end) as waiting_duration,
      sum(case when status = 'respond' then duration end) as respond_duration,
      sum(case when status = 'validation' then duration end) as validation_duration
    from table
    group by ticket_id
    order by ticket_id