Search code examples
sqloracle-databasetop-n

The average duration of first 80% records


For example I have the following list of calls:

  • Call#1 - duration 30 min
  • Call#2 - duration 43 min
  • Call#3 - duration 26 min
  • Call#4 - duration 35 min
  • Call#5 - duration 39 min

1) I need the SQL query retrieving the average duration of first 80% of calls.

Calls that fall into first 80% (according to call duration) are calls #3,#1,#4,#5. For these calls the average duration should be calculated ((26+30+35+39)/4=32,5). Calls above 80% (here call #2) should be ignored.

2) Also I need the vice versa query - what percentage of first calls will have the average call duration of 30 mins?

3) Hot to fetch the duration of 80%-th record (ordered according to call duration). E.g. if there's 500 records, what's the duration of 400th record?

How this SQL queries should look like (Oracle)?


Solution

  • The NTILE() function splits a data-set into buckets; for the top 80% split into 5 and take the top 4:

    select avg(duration)
      from ( select duration, ntile(5) over (order by duration) as bucket
               from ...
                    )
     where bucket <= 4
    

    If you're using Oracle 12c then the row limiting clause has had a ridiculously big functionality upgrade and you can get the actual percentage, something like:

    select avg(duration)
      from ...
     order by duration
     fetch first 80 percent rows with ties
    

    This selects the first 80% of rows in the order of the column DURATION ascending but where there are tied records accepting all of them. Use only instead of with ties to only return the specified percentage.

    There's lots of options, which this blog post also explains pretty well.


    For working out what percentage of calls will have the average call duration of 30 minutes you need to know the running average, the running count and the total number of rows in the table. For a running average the analytic AVG() should work and the analytic COUNT() for the total number of rows:

    select max(running_count) / max(total_calls)
      from ( select duration
                  , count(*) over () as total_calls
                  , row_number() over (order by duration) as running_count
                  , avg(duration) over (order by duration) as running_avg
               from ...
                    )
     where running_avg <= 30