Search code examples
sqlpostgresqlsubqueryconditional-statements

Only SELECT rows where the date is less than a date from another table


I'm refactoring some SQL to remove hardcoded dates that exist within several OR conditions, as part of an INSERT INTO statement.

For example:

OR ( site = 'site.com' AND url = 'www.someurl.com' AND date < '2021-01-01')
OR ( site = 'site.com' AND url = 'www.otherurl.com' AND date < '2019-09-09')

site = 'site.com' always stays the same.

url = 'www.someurl.com' will have different URLs

The dates will also change.

In order to remove several lines that exist like the one above, I've created a reference table that contains a list of the site names, a boolean column to mark the interesting urls, and the dates (same as the hardcoded ones)

I've sorted out everything except AND date < '2021-01-01'. How do I select rows from Table A where the url matches the url from my reference table (Table B), where the date is less than the date in the reference table?

I'm down a subquery rabbit hole with the following:

OR (site = 'site.com' AND url IN ( SELECT url_ref
                                       FROM REF_TABLE
                                       LEFT JOIN TABLE A
                                       ON url = url_ref) AND date < ( SELECT start_date  
                                                                         FROM REF_TABLE
                                                                         LEFT JOIN TABLE A
                                                                         ON url = url_ref ) )

I was able to eliminate another large OR statement that contained a list of urls, but without dates. I'm stuck on these few cases where I need to only take rows where the date is less than a certain date, as defined in the reference table.

That works like this:

            OR (ref.intersting = 't' AND url NOT IN ( SELECT url_ref
                                                         FROM REF_TABLE
                                                         WHERE url_ref IS NOT NULL ) )

Thank you!

p.s. The purpose f this is so that I can stop editing a python script every time a new exception is created. Instead, I will just add a row to the reference table and the script won't need to change.


Solution

  • You can use EXISTS:

    OR
    (outer.site = 'site.com' AND 
     EXISTS (SELECT 1
             FROM REF_TABLE r
             WHERE outer.url = r.url AND
                   outer.date < r.date
            )
    ) 
    

    outer is the table alias for the columns in the outer query.