I have three large tables as follows...
property
--------
property_id
other_prop_data
transfer_property
-----------------
property_id
transfer_id
transfer
--------
transfer_id
contract_date
transfer_price
I want to return a list of unique property IDs for all Transfers that occurred between '2012-01-01' and '2012-06-30'. Here's the code I have so far...
SELECT *
FROM property p
JOIN
(
SELECT t.transfer_id, t.contract_date, t.transfer_price::integer, tp.property_id
FROM transfer t
LEFT JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
) transfer1 ON transfer1.property_id = p.property_id
AND NOT EXISTS
(
SELECT transfer2.transfer_id
FROM
(
SELECT t.transfer_id, t.contract_date, t.transfer_price::integer, tp.property_id
FROM transfer t
LEFT JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
) AS transfer2
WHERE transfer2.property_id = transfer1.property_id
AND transfer2.contract_date > transfer1.contract_date
)
This works (I think) but is very slow.
I have found several similar queries in... https://stackoverflow.com/questions/tagged/greatest-n-per-group ...but most of the ones I found were self joins with the same table, not joined relational tables as above.
I know in MySQL you can use User Variables, but I do not know how to do this in PostgreSQL, or if it is the ideal solution in this case.
Does anybody have any suggestions around how to improve this query (or even how to do it using a completely different method than mine above)?
Any help is very much appreciated. Thanks!
Regards,
Chris
PS: have also tried variations on DISTINCT and MAX, but not convinced they were picking records with the most recent date with the way I was using them.
EDIT: Sorry folks, I should also add that I am running my queries in PGADMIN 1.12.3
"I want to return a list of unique property IDs for all Transfers that occurred between '2012-01-01' and '2012-06-30'."
To me, that appears as:
SELECT DISTINCT tp.property_id
FROM transfer t
JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
;
Now put that in a CTE or subquery, and you are done:
WITH x1 AS (
SELECT DISTINCT tp.property_id AS property_id
FROM transfer t
JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
)
SELECT ...
FROM property p
JOIN x1 ON x1.property_id = p.property_id
;
I don't understand the purpose of the NOT EXISTS subquery. You are only interested in the MAX?
UPDATE: It appears (from the title) you only want the maxdate. Could be done by your not exist construct, or by this MAX(...) in the subquery; like ... :
WITH m1 AS (
SELECT DISTINCT tp.property_id AS property_id
, MAX(t.contract_date) AS contract_date
FROM transfer t
JOIN transfer_property tp ON tp.transfer_id = t.transfer_id
WHERE t.contract_date BETWEEN '2012-01-01' AND '2012-06-30'
GROUP BY tp.property_id
)
SELECT ...
FROM property p
JOIN m1 ON m1.property_id = p.property_id
;