Search code examples
sqlgreatest-n-per-group

SQL- Query to fetch the row having a maximum date below a certain date


Sorry I couldn't explain this better in the title. Here is essentially what I need to accomplish:

Entity table: entity_id

BenchHistory table: entity_id, bench_id, bench_effective_date

Summary table: entity_id, effective_date

That is the database layout in a nutshell. This query finds each entity's effective_date from the Summary table. It also needs to find the bench_id for that particular date, by looking through the BenchHistory table and finding the maximum bench_effective_date that is less than effective_date where BenchHistory.entity_id = Entity.entity_id.

Here is what I have:

SELECT
    Entity.entity_id
    Summary.effective_date
    BenchHistory.bench_id
FROM
    Entity
JOIN Summary ON Entity.entity_id = Summary.entity_id
JOIN BenchHistory ON Entity.entity_id = BenchHistory.entity_id

Pretty simple, the bench part is what I'm having trouble with. How do I select only one BenchHistory.bench_id, that must be the most recent relative to Summary.effective_date?

For clarity, Each entity has many corresponding effective_dates and many corresponding bench_ids, but only one bench_id can be "valid" at a time (the most recent one). I am trying to find each row's "valid" bench_id depending on that row's effective_date. I need to do this by determining which bench_id has a bench_effective_date less than effective_date.

Here is an example of a query that I have, maybe it will be easier to visualize then.

SELECT
    Entity.entity_id
    BenchHistory.bench_id
    Summary.effective_date
    BenchHistory.bench_effective_date
FROM
    Entity
JOIN Summary ON Entity.entity_id = Summary.entity_id
JOIN BenchHistory ON Entity.entity_id = BenchHistory.entity_id

This will give output like:

entity_id    bench_id    effective_date    bench_effective_date
1            120         1/31/2011         6/30/2003
1            121         1/31/2011         3/22/2005
1            122         1/31/2011         11/03/2008
1            123         1/31/2011         1/21/2011
1            124         12/30/2010        5/15/2010
1            125         12/30/2010        10/06/2010

What I want to grab, is just bench_id 123 for 1/31, as it is the most recent bench for entity_id = 1 and bench_id 125 for 12/30, etc. So a result set:

entity_id    bench_id    effective_date    bench_effective_date
1            123         1/31/2011         1/21/2011
1            125         12/30/2010        10/06/2010

Thank you, sorry if this is an easy question but I have been working on it for 6 hours trying all sorts of subqueries, aggregates, GROUP BY, and whatever. I am not that experienced with SQL.

:)


Solution

  • This is NOT an easy problem, it took me a lot time too. Basically the same question as Mysql - Need to get latest of table A when referenced from table B and is related to mysql feature request http://bugs.mysql.com/bug.php?id=2020 , see it for info.

    Maybe easiest for you is to follow this example:

    Suppose you have table with price for each goods in each store. For each goods, you want to see the minimal price and the related store, in which you get it for the price! Exactly the same as in your example - you want a record with max revision.

    create table prices ( goods varchar(10), price double, store varchar(10) );
    
    insert into prices values ('car', 200, 'Amazon'), ('car', 150, 'CarStore'), ('Bread', 2, 'Baker1'), ('Bread', 1, 'Baker2');
    
    select goods, min(price), 
        (select store 
         from prices as p 
         where p.goods = prices.goods
         order by price limit 1) as store
    from prices
    group by goods;