Search code examples
sqlsql-servergroup-byself-joinhaving

Get latest and earliest value of different column from same table sql


I have a table with a selection of dates. I want to be able to apply a WHERE statement to return a group of them and then return which date is the earliest from one column and which one is the latest from another column. Here is an example table:

ID  StartDate   EndDate     Person   Subject
1   01/03/2010  03/03/2010   Paul    Math
2   12/05/2010  22/05/2010   Steve   Science
3   04/03/2010  08/03/2010   Paul    English

So I want to return all the records where Person = 'Paul'. But return something like (earliest ) StartDate = 01/03/2010 (from record ID 1) and (latest) EndDate = 08/03/2010 (from record ID 3) and Subject = English from latest EndDate (from record 3 ). I need two rows for Paul with the above data and Id.Need all row with person = 'Paul'

Expected Result :

ID  StartDate   EndDate Person  Subject
1   2010-03-01  2010-03-08  Paul    English
3   2010-03-01  2010-03-08  Paul    English

Do you see any problem regarding performance (Millions of records) in below sql query :

 Select PT3.ID,PT4.EarliestStartDate AS StartDate,PT4.EndDate,PT4.Person,PT4.Subject from Data AS PT3 
Join( SELECT Top 1 with ties * FROM Data AS PT  
Join( SELECT PT1.Person as Person1, MIN(PT1.StartDate ) as EarliestStartDate FROM Data AS PT1 where Person ='Paul' group by PT1.Person)  AS PT2 ON  PT.Person =PT2.Person1WHERE Person  ='Paul'
  Order By Row_Number() over (Partition By PT.Person Order By PT.EndDate desc)) AS PT4 ON PT3.Person = PT4.Person 

Solution

  • This could work:

    1. Perform the group by first to get the required dates for each person.
    2. Move the grouping to a common table expression (cte).
    3. Join back with the full data set to fetch all rows.
    4. For each result record, select the corresponding subject with a cross apply.
    5. Optional: filter on the required person (where clause).

    Sample data

    create table Data
    (
      ID int,
      StartDate date,
      EndDate date,
      Person nvarchar(10),
      Subject nvarchar(10)
    );
    
    insert into Data (ID, StartDate, EndDate, Person, Subject) values
    (1, '2010-03-01', '2010-03-03', 'Paul',  'Math'),
    (2, '2010-05-12', '2010-05-22', 'Steve', 'Science'),
    (3, '2010-03-04', '2010-03-08', 'Paul',  'English');
    

    Solution

    with cte as
    (
      select min(d.StartDate) as StartDate,
             max(d.EndDate) as EndDate,
             d.Person
      from Data d
      group by d.Person
    )
    select d.Id,
           c.StartDate,
           c.EndDate,
           c.Person,
           x.Subject
    from cte c
    join data d
      on d.Person = c.Person
    cross apply ( select top 1 d2.Subject
                  from Data d2
                  where d2.Person = c.Person
                    and d2.EndDate = c.EndDate ) x
    where d.Person = 'Paul'
    order by d.Person;
    

    Fiddle to see it in action.