Search code examples
sqlpostgresqlgreatest-n-per-group

Replace nested query to get latest related row


I have a query to get all penalty data with latest prices. In prices table there are many prices related to a product. To take the valid price I need to get the latest price. Here is the query.

SELECT *
FROM product 
LEFT JOIN product_price productPrice
          ON product.id = productPrice.product_id
          AND productPrice.valid_from =
               (SELECT valid_from FROM product_price
                WHERE product_price.is_active = true AND valid_from <= now()
                ORDER BY valid_from DESC LIMIT 1)
WHERE product.id = 1;

My question: Is there any way to replace the nested query here in order to take only one latest date from the product_price table or is it ok to use nested queries.


Solution

  • To retrieve data for a single given product, a LATERAL subquery is typically fastest. Most flexible, too.

    SELECT *
    FROM   product p
    LEFT   JOIN LATERAL (
       SELECT *
       FROM   product_price pp
       WHERE  pp.product_id = p.id
       AND    pp.is_active
       AND    pp.valid_from <= now()
       ORDER  BY pp.valid_from DESC
       LIMIT  1
       ) ON true
    WHERE  p.id = 1;
    

    See:

    Be sure to have an index on product_price (product_id, valid_from), or some refined version, depending on undisclosed details. A PK index on product(id) can be assumed.

    Retrieving most or all products at once is typically faster with a different query style. But your given query clearly indicates a single product.

    Subtle differences

    Your original returns all related rows from product_price with the latest valid_from. My query always picks a single one. Typically, (product_id, valid_from) should be defined UNIQUE to make this unambiguous.

    Your original also returns no row from product_price where any related row with valid_from IS NULL exists. Null sorts on top in descending sort order and the subsequent join condition valid_from = null eliminates all rows. You would have a NOT NULL constraint or at least use DESC NULLS LAST in the query to sort null values last. See:

    (And you should declare all of that in the question.)