Search code examples
mysqlsubqueryweighted-average

MySQL / BigQuery - Weighted Average & Group By


I am trying to calculate a weighted average of a dataset and return the maximum value, monthly over a period of 12 months along with its' corrosponding ticket description.

I'm aware that there are tons of questions out there addressing similar problems, but I have yet to find a solution that combines the syntaxes I believe are required.

Here's some sample table data:

Month_Begin_Date Priority ticket_about_tag Phone_Time Occurances
2019-02-01 Urgent Power Bill 22.42 36
2019-02-01 Normal Power Bill 3.41 89
2019-05-01 Normal Wifi Issue 45.32 12

Here's my current query for determining the weighted average:

SELECT (Month_Begin_Date, 
(sum(phone_time * occurances))/sum(occurances)) AS Weighted_Average_Phone_Time
FROM database
GROUP BY month_begin_date

This returns the weighted average total for all ticket_about_tags, monthly.

But I still need to get this so that it displays the maximum weighted average grouped by ticket description. I.e. something that looks like this:

Month_Begin_Date ticket_about_tag Weighted_average_phone_time
2019-01-01 Power Bill 22.42
2019-02-01 Power Bill 3.41
2019-03-01 Wifi Issue 45.32

I've tried adding this as a subquery into another query in order to return the data I'm after, like so:

SELECT Month_Begin_date, Ticket_About_Tag, Phone_Average_Handle_Time 
FROM database WHERE CONCAT(month_begin_date,phone_time) IN 
(SELECT CONCAT (Month_Begin_Date, 
(sum(phone_time * occurances))/sum(occurances)) AS Weighted_Average_Phone_Time
FROM database
GROUP BY month_begin_date
)
ORDER BY month_begin_date ASC  

Thanks very much for any assistance


Solution

  • Not sure I got your question right, but using the following data:

    Month_Begin_Date Priority Ticket_About_Tag Phone_Time Occurences
    2019-02-01 Urgent Power Bill 22.42 36
    2019-02-01 Normal Power Bill 3.41 89
    2019-05-01 Normal Wifi Issue 45.32 12
    2019-02-01 Urgent Wifi Issue 14.2 7
    2019-02-01 Normal Wifi Issue 30.7 5

    Is this the query you're after?

    SELECT 
    Month_Begin_Date, Ticket_About_Tag,
    SUM(Phone_Time * Occurences) / SUM(Occurences) AS Weighted_Average_Phone_Time
    FROM `database`
    GROUP BY Month_Begin_Date, Ticket_About_Tag
    ORDER BY Month_Begin_Date ASC, Ticket_About_Tag ASC;
    

    That gives you a result like the one you posted:

    Month_Begin_Date Ticket_About_Tag Weighted_Average_Phone_Time
    2019-02-01 Power Bill 8.884880083084106
    2019-02-01 Wifi Issue 21.075000206629436
    2019-05-01 Wifi Issue 45.31999969482422

    Response to your comment

    To answer your comment you could:

    SELECT
    a.Month_Begin_Date, 
    a.Ticket_About_Tag, 
    b.Max_Weighted_Average_Phone_Time
    FROM (
    
    SELECT 
    Month_Begin_Date,
    Ticket_About_Tag,
    SUM(Phone_Time * Occurences) / SUM(Occurences) AS Weighted_Average_Phone_Time
    FROM `database`
    GROUP BY Month_Begin_Date, Ticket_About_Tag
    
    ) a
    
    LEFT JOIN (
    
    SELECT
    b1.Month_Begin_Date, 
    MAX(b1.Weighted_Average_Phone_Time) AS Max_Weighted_Average_Phone_Time
    FROM (
    
    SELECT 
    Month_Begin_Date,
    Ticket_About_Tag,
    SUM(Phone_Time * Occurences) / SUM(Occurences) AS Weighted_Average_Phone_Time
    FROM `database`
    GROUP BY Month_Begin_Date, Ticket_About_Tag
    
    ) b1
    
    GROUP BY b1.Month_Begin_Date
    
    ) b ON a.Month_Begin_Date = b.Month_Begin_Date
    
    WHERE a.Weighted_Average_Phone_Time = b.Max_Weighted_Average_Phone_Time
    

    That gives you the following output:

    Month_Begin_Date Ticket_About_Tag Max_Weighted_Average_Phone_Time
    2019-02-01 Wifi Issue 21.075000206629436
    2019-05-01 Wifi Issue 45.31999969482422

    There are other ways of doing this, but I think this is by far the easiest way to understand without using other SQL constructs. It reflects your need of going through the same data twice, first to aggregate by month and ticket tag, then to find the maximum of the aggregate data by month.