Search code examples
sqlpostgresql

How to sum 2 values from different queries?


I am trying determine how far below/above a persons average a venue performs. For example the query would return the venue, date, and how far below the expected average it performed. From the data below you can see Pauls average was 100 not including Lakeview so he went exactly his average and Mel went 20 below her average at Lakeview so Lakeview performed -10 on average below peoples average distance.

The query return would look like this:

Venue      Date           Performance
-------------------------------------
Lakeview   12/08/2024     -10

Firstly I need to group by name to get their average. However I only wish to do include their last 5 distances.

I can do that with this query

SELECT name, AVG(distance) as avg_distance
FROM 
    (SELECT 
         database.*,
         ROW_NUMBER() OVER (PARTITION BY name ORDER BY date DESC) AS seqnum
     FROM database) s
WHERE seqnum <= 5
GROUP BY name

Then I need to group by venue and find the average distance of everyone that performed on this date and at this venue. Which I can do with this query

SELECT venue, AVG(distance)
FROM "database"
GROUP BY venue, date

Then I need to subtract the distance on this date vs their average to see how it performed. What is the easiest way to do this?

Sample Data

Name  Venue     Distance   Date
-------------------------------------
Paul  Holland   120        20/07/2024
Paul  Venus     80         21/07/2024
Paul  Lakeview  100        12/08/2024
Tom   Lakeview  120        12/08/2024
Mel   Lakeview  140        12/08/2024
Mel   Hightail  160        05/05/2024

Solution

  • One option is to crate a cte (or subquery) with added columns with average distances per name and per venue/date. It could be done using Avg() Over() analytic function.

    WITH    --  S a m p l e    D a t a : 
        tbl ( a_name, venue, distance, a_date ) AS 
          ( Select 'Paul', 'Holland',   120, To_Date('20/07/2024', 'dd/mm/yyyy') Union All 
            Select 'Paul', 'Venus',      80, To_Date('21/07/2024', 'dd/mm/yyyy') Union All 
            Select 'Paul', 'Lakeview',  100, To_Date('12/08/2024', 'dd/mm/yyyy') Union All 
            Select 'Tom', 'Lakeview',   120, To_Date('12/08/2024', 'dd/mm/yyyy') Union All 
            Select 'Mel', 'Lakeview',   140, To_Date('12/08/2024', 'dd/mm/yyyy') Union All 
            Select 'Mel', 'Hightail',   160, To_Date('05/05/2024', 'dd/mm/yyyy') 
          ), 
    

    ... cte grid to get averages per name (last 5 dates) and per venue/date (all distances)

        grid AS 
          ( SELECT  a.*, 
                    Round(Avg(Case When rn_name <= 5 Then a.distance End) Over(Partition By a_name ), 2) as avg_name, 
                    Round(Avg(a.distance) Over(Partition By venue, a_date ), 2) as avg_venue_date
            FROM  ( Select Row_Number() Over(Partition By a_name Order By a_name, a_date Desc) as rn_name, 
                           a_name, venue, distance, a_date
                    From tbl
                  )  a
          ) 
    
    --      M a i n    S Q L : 
    Select    venue, a_date, Sum(avg_venue_date - avg_name) as performance
    From      grid 
    Group By  a_date, venue
    Order By  a_date, venue
    
    /*      R e s u l t : 
    venue        a_date      performance
    -----------  ----------  -----------
    Hightail    2024-05-05         10.00
    Holland     2024-07-20         20.00
    Venus       2024-07-21        -20.00
    Lakeview    2024-08-12        -10.00      */
    

    See the fiddle here.

    Addition:
    If your table name is database (which is bad but anyway) and if column names /types in that table are name /varchar, venue /varchar, distance /integer and date /date (date is also very, very bad name of a column 'cause it is datatype too) then try to use the complete code like below. There is another (first) cte that should adjust your table to the rest of the code:

    WITH    
        tbl AS 
          ( Select t.name as a_name, t.venue, t.distance, t.date as a_date
            From database t
          ), 
        grid AS 
          ( SELECT  a.*, 
                    Round(Avg(Case When rn_name <= 5 Then a.distance End) Over(Partition By a_name ), 2) as avg_name, 
                    Round(Avg(a.distance) Over(Partition By venue, a_date ), 2) as avg_venue_date
            FROM  ( Select Row_Number() Over(Partition By a_name Order By a_name, a_date Desc) as rn_name, 
                           a_name, venue, distance, a_date
                    From tbl
                  )  a
          ) 
    Select    venue, a_date, Sum(avg_venue_date - avg_name) as performance
    From      grid 
    Group By  a_date, venue
    Order By  a_date, venue