Search code examples
mysqldml

Ponderate average MYSQL


We have a little simulator of a tour-operator DB (MYSQL) and we are asked to get a Query that gives us the weighted avg of duration of the tours that we have.

https://en.wikipedia.org/wiki/Weighted_arithmetic_mean

Using subquery I got to this point where I have the days that each tour lasts and the weight of each tour from the total of tours, but I am stuck and don't know how to get the weighted avg from here. I know I have to use another select from the result I already got but I would appreciate some help.

SQLfiddle down here:

http://sqlfiddle.com/#!9/53d80/2

Tables and data

CREATE TABLE STAGE
(
    ID INT AUTO_INCREMENT NOT NULL,
    TOUR INT NOT NULL,
    TYPE INT NOT NULL,
    CITY INT NOT NULL,
    DAYS INT NOT NULL,
    PRIMARY KEY (ID)
);

CREATE TABLE TOUR
(
    ID INT AUTO_INCREMENT NOT NULL,
    DESCRIPTION VARCHAR(255) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI 
    NOT NULL,
    STARTED_ON DATE NOT NULL,
    TYPE INT NOT NULL,
    PRIMARY KEY (ID)
);

INSERT INTO TOUR (DESCRIPTION, STARTED_ON, TYPE) VALUES 
('Mediterranian Cruise','2018-01-01',3),
('Trip to Nepal','2017-12-01',1),
('Tour in Nova York','2015-04-24',5),
('A week at the Amazones','2014-09-11',2),
('Visiting the Machu Picchu','2013-02-19',4);

INSERT INTO STAGE (TOUR, TYPE, CITY, DAYS) VALUES 
(1, 1, 38254, 1),
(1, 2, 22460, 3),
(1, 2, 47940, 3),
(1, 2, 42600, 4),
(1, 3, 38254, 1),
(2, 1, 13097, 1),
(2, 2, 29785, 5),
(2, 3, 13097, 1),
(3, 1, 788, 2); ,
(3, 2, 48019, 6),
(3, 3, 788, 1),
(4, 1, 38254, 2),
(4, 2, 8703, 3);,
(4, 3, 38254, 4),
(5, 1, 10453, 1),
(5, 2, 32045, 5),
(5, 3, 10453, 2);

Query:

SELECT
    AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
    COUNT(TD.TOUR_ID) AS WEIGHT
FROM
(
    SELECT
        TOUR.ID AS TOUR_ID, 
        SUM(DAYS) AS TOUR_DAYS,
        COUNT(STAGE.ID) AS STAGE_DAYS
    FROM
        TOUR
    INNER JOIN
        STAGE
    ON
        TOUR.ID = STAGE.TOUR
    GROUP BY
        TOUR.ID
) AS TD
GROUP BY 
    TD.TOUR_DAYS

weigthed avg would be:
(1×7+1×8+2×9+1×12) / (1+1+2+1) = 9


Solution

  • Wheighted AVG can be calculated with SUM(value * wheight) / SUM(wheight). In your case:

    SELECT SUM(AVERAGE_DAYS * WEIGHT) / SUM(WEIGHT)
    FROM (
        SELECT
                AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
                COUNT(TD.TOUR_ID) AS WEIGHT
            FROM
            (
                SELECT
                    TOUR.ID AS TOUR_ID, 
                    SUM(DAYS) AS TOUR_DAYS,
                    COUNT(STAGE.ID) AS STAGE_DAYS
                FROM
                    TOUR
                INNER JOIN
                    STAGE
                ON
                    TOUR.ID = STAGE.TOUR
                GROUP BY
                    TOUR.ID
            ) AS TD
            GROUP BY 
                TD.TOUR_DAYS
    ) sub
    

    http://sqlfiddle.com/#!9/53d80/4

    I'm not 100% sure, but it looks like the following query is doing exactly the same:

    SELECT AVG(TOUR_DAYS)
    FROM (
      SELECT TOUR, SUM(DAYS) AS TOUR_DAYS
      FROM STAGE
      GROUP BY TOUR
    ) sub;
    

    Or even without any subqueries:

    SELECT SUM(DAYS) / COUNT(DISTINCT TOUR)
    FROM STAGE;
    

    That would mean, the requirement should be simplified to "Get average number of days per tour".