Search code examples
mysqlsqlstatisticsmedian

How to get avg time taken by grouping a particular column using sql?


I have a table like this from which I am getting total number of call_count for a particular model_id in PHX, SLC and SJC datacenter for a particular meta_date.

machine_name    hour_of_day     sec_of_hour     meta_date               client_id       model_id    call_count      avg         median      datacenter

machineA            13              168         2014-07-30 00:00:00      Test13N           65           100         38.6382       5         PHX
machineA            13              170         2014-07-30 00:00:00      Test13N           65           200         34.382        4         PHX
machineB            13              168         2014-07-30 00:00:00      Test13N           65           200         33.62         3         SLC
machineB            13              170         2014-07-30 00:00:00      Test13N           65           300         32.60         2         SLC
machineC            13              168         2014-07-30 00:00:00      Test13N           65           400         31.20         6         SJC
machineC            13              170         2014-07-30 00:00:00      Test13N           65           500         39.82         8         SJC

So with the below query -

SELECT client_id, model_id, 
  sum(case when datacenter = 'PHX' then call_count END) phx, 
  sum(case when datacenter = 'SLC' then call_count END) slc, 
  sum(case when datacenter = 'SJC' then call_count END) sjc 
FROM models b 
where meta_date= CURDATE()-1 
group by client_id, model_id 
order by client_id, model_id;

I get result as which means for model_id 65, 300 calls were made from phx, 500 calls were made from slc and 900 calls were made from sjc datacenter.

client_id   model_id    phx     slc     sjc

Test13N       65        300     500     900

Problem Statement:-

Now what I am supposed to do is, I also need to show the avg time taken in PHX, SLC and SJC datacenter for all the calls for a particular model_id. Similarly for median as well for PHX, SLC and SJC datacenter.

Meaning what is the avg time taken by 300 calls in PHX datacenter and median time taken by 300 calls in PHX datacenter. Similarly for SLC and SJC.

And I am thinking of using weighted average here. Since with the above table for machineA, 100 calls took 38.6382 ms on average in PHX datacenter for 168 second of hour and 200 calls took 34.382 ms on average in 170 second of hour.

So I am thinking of calculating avg like this for PHX datacenter using weighted average here-

(100 * 38.6382 + 200* 34.382) / (100 + 200)

so at the end I need to show the result like this -

client_id   model_id    phx     slc     sjc     phx_avg     phx_median      slc_avg     slc_median      sjc_avg     sjc_median

Test13N       65        300     500     900

How do I accomodate the above formula in my current sql so that I can calculate avg time taken by 300 calls in PHX and median time taken by 300 calls in PHX. Similarly for SLC and SJC. I am working with mysql database.


Solution

  • Your idea of using a weighted average to compute the average time is perfect.

    This query (http://sqlfiddle.com/#!2/031211/5/0) does the job just fine.

    SELECT ... 
      sum(case when datacenter = 'PHX' then call_count END) phx, 
      sum(case when datacenter = 'PHX' then call_count * avg END)/
      sum(case when datacenter = 'PHX' then call_count END) phx_avg, 
    

    You don't have sufficient information to compute the median; that requires presence of the detail records that went into the medians shown in your table. You could fake the computation somehow. But if your traffic engineers base their work on a fake median computation, your company will be sorry. Don't do that!