Search code examples
sqlsqliteinner-joincorrelated-subquery

Correlated Subquery? pulling data from different columns, same table


I am trying to pull data from different columns using multiple criteria, but having trouble figuring out how, I believe it is a correlated subquery I need and have experimented doing a few different ways but can't figure it out.

I am looking to get the averages for the Miami Heat's win for the following categories + the averages of the New York Knicks losses for the following categories and combine them into one average.

So this is my query for the Heat which retrieves exactly what i want

SELECT
    box_score.team_name, 
    ROUND(AVG(eFG),3) eFG,
    ROUND(AVG(OPP_eFG),3) OPP_eFG,
    ROUND(AVG(TOV_PCT),3) TOV_PCT,
    ROUND(AVG(OPP_TOV_PCT),3) OPP_TOV_PCT,
    ROUND(AVG(ORB_PCT),3) ORB_PCT,
    ROUND(AVG(DRB_PCT),3) DRB_PCT,
    ROUND(AVG(FTA_RATE),3) FTA_RATE,
    ROUND(AVG(OPP_FTA_RATE),3) OPP_FTA_RATE
FROM box_score
WHERE team_name = 'Miami Heat' AND WIN_LOSS = 'W' AND game_date < '2019-03-07' 

I also did the losses for the knicks which also resulted in what i wanted

WHERE team_name = 'New York Knicks' AND WIN_LOSS = 'L' AND game_date < '2019-03-07' 

My problem is trying to combine the two into one query where I get the averages in the Heat's wins and the average in knick's losses. all of this information is from the same table and I can get team information from an id number or by name... I am using SQLite if that changes anything

This is the result of running the query which is what i'm looking for one row of data with the average... however I would like the average of these numbers in both Heat wins and Knicks losses combined into one row

Averages in Heat Wins

eFG    OPP_eFG  TOV_PCT  OPP_TOV_PCT  ORB_PCT  DRB_PCT  FTA_RATE  OPP_FTA_RATE
0.603  0.505    0.14     0.126        0.28     0.77     0.235     0.141

These are the averages in knicks losses

eFG    OPP_eFG  TOV_PCT  OPP_TOV_PCT  ORB_PCT  DRB_PCT  FTA_RATE  OPP_FTA_RATE
0.568  0.602    0.146    0.136        0.225    0.787    0.222     0.235

I want to get the two combined into 1 average for each category

but is there any way to make it so I get the averages to pull data from seperate columns?

In this case I'm interested in the Miami Heat, so I have the averages above but I what i'm trying to do is get the averages from the Heat to the corresponding opposite stat for the knicks (eFG should correlate with opp_eFG of other team and so on)...so basically I am looking for the averages of the following:

Heat eFG and OPP_eFG Knicks

Heat OPP_eFG and eFG Knicks

Heat TOV_PCT and OPP_TOV_PCT Knicks

Heat OPP_TOV_PCT and TOV_PCT Knicks

Heat FTA_RATE and OPP_FTA_RATE Knicks

Heat OPP_FTA_RATE and FTA_RATE Knicks

still looking to get 1 row as the result


Solution

  • One solution would be to perform the whole operation in a single table scan (without joins or subqueries), using conditional aggregation:

    SELECT  
        box_score.team_name, 
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN eFG          END),3) Heat_eFG,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN eFG          END),3) Knicks_eFG,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_eFG      END),3) Heat_OPP_eFG,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_eFG      END),3) Knicks_OPP_eFG,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN TOV_PCT      END),3) Heat_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN TOV_PCT      END),3) Knicks_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_TOV_PCT  END),3) Heat_OPP_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_TOV_PCT  END),3) Knicks_OPP_TOV_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN ORB_PCT      END),3) Heat_ORB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN ORB_PCT      END),3) Knicks_ORB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN DRB_PCT      END),3) Heat_DRB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN DRB_PCT      END),3) Knicks_DRB_PCT,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN FTA_RATE     END),3) Heat_FTA_RATE,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN FTA_RATE     END),3) Knicks_FTA_RATE,
        ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_FTA_RATE END),3) Heat_OPP_FTA_RATE,
        ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_FTA_RATE END),3) Knicks_OPP_FTA_RATE
    FROM box_score
    WHERE team_name IN ('Miami Heat', 'New York Knicks') AND game_date < '2019-03-07' 
    

    Here is another version of the query if you are looking to compute the average, for example of both eFG for Miami wins AND OPP_eFG on New York losses in a single column. This still relies on conditional aggregation. I also slighly simplified the logic by moving conditions to the WHERE clause.

    SELECT  
        box_score.team_name, 
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN eFG 
            WHEN team_name = 'New York Knicks' THEN OPP_eFG 
        END, 3) Heats_eFG_Knicks_OPP_eFG, 
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN OPP_eFG 
            WHEN team_name = 'New York Knicks' THEN eFG 
        END, 3) Heats_OPP_eFG_Knicks_eFG,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN TOV_PCT 
            WHEN team_name = 'New York Knicks' THEN OPP_TOV_PCT 
        END, 3) Heats_TOV_PCT_Knicks_OPP_TOV_PCT,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN OPP_TOV_PCT 
            WHEN team_name = 'New York Knicks' THEN TOV_PCT 
        END, 3) Heats_OPP_TOV_PCT_Knicks_TOV_PCT,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN FTA_RATE 
            WHEN team_name = 'New York Knicks' THEN OPP_FTA_RATE 
        END, 3) Heats_FTA_RATE_Knicks_OPP_FTA_RATE,
        ROUND(AVG(CASE 
            WHEN team_name = 'Miami Heat'      THEN OPP_FTA_RATE 
            WHEN team_name = 'New York Knicks' THEN FTA_RATE 
        END, 3) Heats_OPP_FTA_RATE_Knicks_FTA_RATE
    FROM box_score
    WHERE 
        game_date < '2019-03-07' 
        AND (
               ( team_name = 'Miami Heat'      AND win_loss = 'W' )
            OR ( team_name = 'New York Knicks' AND win_loss = 'L') 
        )
    

    NB: as commented by wildpasser, you probably want to use single quotes instead of double quotes around litteral values (this is the SQL standard). I globally all of the double quotes in the original query to single quotes.