Search code examples
selectinner-join

Year range and SUM from two tables using Inner Join


I have two tables - Batting with Year field for each record that spans Persons career and Person has DebutYear and Final Year. All 3 of these are smallint. I am using the following query to get ONLY the Years from Batting in a specified career range between DebutYear+value to DebutYear+2nd-value. In this case it should only SUM the AB for those years but it is summing ALL the years in the Batting table. I am a little inexperienced with JOINS. Any help would be appreciated!

SELECT CONCAT(FirstName, ' ', LastName) AS Name, B.PersonID, SUM(AB) AS AB
  FROM Batting B
  INNER JOIN Person P ON P.PersonID = B.PersonID
    AND (B.Year >= (P.DebutYear+1) AND B.Year <= (P.DebutYear+2))
  WHERE B.PersonID = P.PersonID 
  GROUP BY B.PersonID
  HAVING (AB >= 10000)
  ORDER BY AB Desc, LastName, FirstName
  LIMIT 1000

Sample data:

PersonID Name DebutYear
rosepe01 Pete Rose 1963
Batting PersonID Year AB
1 rosepe01 1963 623
2 rosepe01 1964 516
3 rosepe01 1965 670
4 rosepe01 1966 654
5 rosepe01 1967 585
6 rosepe01 1968 626

Solution

  • The filter should be applied in the WHERE clause, not the JOIN.

    Note: Your GROUP BY clause could just be the B.PersonID if you wanted, as long as it is an indexed primary key. I kept FirstName and LastName since they are referenced in the SELECT statement.

    SELECT CONCAT(P.FirstName, ' ', P.LastName) AS Name, B.PersonID, SUM(B.AB) AS AB
      FROM Batting B
      INNER JOIN Person P ON P.PersonID = B.PersonID
      WHERE B.Year BETWEEN (P.DebutYear + 1) AND (P.DebutYear + 2)
      GROUP BY B.PersonID, P.FirstName, P.LastName -- Could just be B.PersonID
      HAVING SUM(B.AB) >= 10000
      ORDER BY AB DESC, P.LastName, P.FirstName
      LIMIT 1000;
    

    ATTENTION: According to your data, Pete Rose had 1,186 at bats (AB) in his second and third year. Your threshold is checking for an AB cound at or exceeding 10,000. I think your query is off by a factor of 10. Did you get that confused with the limit?


    Example

    Here is a working example that I verified. I included setup, data, and the query from above. Each step can be executed as one contiguous script within a sandbox.

    Setup

    Update: I tried to match your schema according to your comment.

    Setup or re-create the tables:

    -- Reset
    DROP TABLE IF EXISTS Batting;
    DROP TABLE IF EXISTS Person;
    
    -- Schemas
    CREATE TABLE Person (
        PersonID VARCHAR(12) PRIMARY KEY,
        FirstName VARCHAR(100),
        LastName VARCHAR(100),
        DebutYear SMALLINT(4),
        FinalYear SMALLINT(4)
    );
    
    CREATE TABLE Batting (
        BattingID INT PRIMARY KEY,
        PersonID VARCHAR(12),
        Year SMALLINT(4),
        AB SMALLINT(3),  -- At Bats
        FOREIGN KEY (PersonID) REFERENCES Person(PersonID)
    );
    

    Data

    Update: I added the Pete Rose stats.

    Here is the sample data to query against:

    -- Sample data
    INSERT INTO Person (PersonID, FirstName, LastName, DebutYear, FinalYear)
    VALUES 
    ('doejo01'   , 'John'  , 'Doe'   , 2000 , 2010),
    ('smithji01' , 'Jimmy' , 'Smith' , 1995 , 2005),
    ('brownja01' , 'James' , 'Brown' , 1972 , 1982),
    ('rosepe01'  , 'Pete'  , 'Rose'  , 1963 , 1986);
    
    INSERT INTO Batting (BattingID, PersonID, Year, AB)
    VALUES 
    ( 1, 'doejo01',   2001,  5000), -- John Doe    (2001, Year 2, total so far: 5000)
    ( 2, 'doejo01',   2002,  6000), -- John Doe    (2002, Year 3, total so far: 11000)
    ( 3, 'doejo01',   2003,  7000), -- John Doe    (2003, Year 4, outside range, not included)
    ( 4, 'smithji01', 1996,  4000), -- Jimmy Smith (1996, Year 2, total so far: 4000)
    ( 5, 'smithji01', 1997,  5500), -- Jimmy Smith (1997, Year 3, total so far: 9500, below threshold)
    ( 6, 'brownja01', 1972, 10000), -- James Brown (1973, Year 1, outside range)
    ( 7, 'brownja01', 1973,  6000), -- James Brown (1973, Year 2, total so far: 6000)
    ( 8, 'brownja01', 1974,  7000), -- James Brown (1974, Year 3, total so far: 13000)
    ( 9, 'rosepe01',  1963,   623), -- Pete Rose   (1963, Year 1, outside range)
    (10, 'rosepe01',  1964,   516), -- Pete Rose   (1963, Year 2, total so far: 516)
    (11, 'rosepe01',  1965,   670), -- Pete Rose   (1963, Year 3, total so far: 1186, below threshold)
    (12, 'rosepe01',  1966,   654), -- Pete Rose   (1963, Year 4, outside range, not included)
    (13, 'rosepe01',  1967,   585), -- Pete Rose   (1963, Year 5, outside range, not included)
    (14, 'rosepe01',  1968,   626); -- Pete Rose   (1963, Year 6, outside range, not included)
    

    Query

    Here is the query from above:

    -- Query for players with an AB >= 10000 between DebutYear + 1 and DebutYear + 2
    SELECT CONCAT(P.FirstName, ' ', P.LastName) AS Name, B.PersonID, SUM(B.AB) AS AB
      FROM Batting B
      INNER JOIN Person P ON P.PersonID = B.PersonID
      WHERE B.Year BETWEEN (P.DebutYear + 1) AND (P.DebutYear + 2)
      GROUP BY B.PersonID, P.FirstName, P.LastName
      HAVING SUM(B.AB) >= 10000
      ORDER BY AB DESC, P.LastName, P.FirstName
      LIMIT 1000;
    

    Result

    The result should only include James and John, but not Jimmy.

    Name PersonID AB
    James Brown 3 13000
    John Doe 1 11000