Search code examples
postgresqljoinsubqueryrelationalmaxdate

How do I improve query to select records with MAX date that uses joined tables in PostgreSQL?


I have three large tables as follows...

property
--------
property_id
other_prop_data

transfer_property
-----------------
property_id
transfer_id

transfer
--------
transfer_id
contract_date
transfer_price

I want to return a list of unique property IDs for all Transfers that occurred between '2012-01-01' and '2012-06-30'. Here's the code I have so far...

SELECT *
FROM property p
JOIN
(
  SELECT t.transfer_id, t.contract_date, t.transfer_price::integer, tp.property_id
  FROM transfer t
  LEFT JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
  WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
) transfer1 ON transfer1.property_id = p.property_id

AND NOT EXISTS
(
  SELECT transfer2.transfer_id
  FROM
  (
    SELECT t.transfer_id, t.contract_date, t.transfer_price::integer, tp.property_id
    FROM transfer t
    LEFT JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
    WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
  ) AS transfer2
  WHERE transfer2.property_id = transfer1.property_id
  AND transfer2.contract_date > transfer1.contract_date
)

This works (I think) but is very slow.

I have found several similar queries in... https://stackoverflow.com/questions/tagged/greatest-n-per-group ...but most of the ones I found were self joins with the same table, not joined relational tables as above.

I know in MySQL you can use User Variables, but I do not know how to do this in PostgreSQL, or if it is the ideal solution in this case.

Does anybody have any suggestions around how to improve this query (or even how to do it using a completely different method than mine above)?

Any help is very much appreciated. Thanks!

Regards,

Chris

PS: have also tried variations on DISTINCT and MAX, but not convinced they were picking records with the most recent date with the way I was using them.

EDIT: Sorry folks, I should also add that I am running my queries in PGADMIN 1.12.3


Solution

  • "I want to return a list of unique property IDs for all Transfers that occurred between '2012-01-01' and '2012-06-30'."

    To me, that appears as:

    SELECT DISTINCT tp.property_id
      FROM transfer t
      JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
      WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
         ;
    

    Now put that in a CTE or subquery, and you are done:

    WITH x1 AS (
          SELECT DISTINCT tp.property_id AS property_id
          FROM transfer t
          JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
          WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
          )
    SELECT ...
    FROM property p
    JOIN x1 ON x1.property_id = p.property_id
        ;
    

    I don't understand the purpose of the NOT EXISTS subquery. You are only interested in the MAX?

    UPDATE: It appears (from the title) you only want the maxdate. Could be done by your not exist construct, or by this MAX(...) in the subquery; like ... :

    WITH m1 AS (
          SELECT DISTINCT tp.property_id AS property_id
            , MAX(t.contract_date) AS contract_date
          FROM transfer t
          JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
          WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
            GROUP BY tp.property_id
          )
    SELECT ...
    FROM property p
    JOIN m1 ON m1.property_id = p.property_id
        ;