Search code examples
sqldatabaseoptimizationdb2query-optimization

DB2: Select rows where the table1 date is older than date in table2


I am trying to optimise my SQL query for use in a big DB2 table with 10,000s of records. I need to extract rows from Table 1, where the date is older than the rows in Table 2. Table 2 can have many rows that are more recent but I only care about the Table 1 data. I have a query but it's really slow and I am looking to optimise it further. I have gone for this approach:

SELECT * FROM TABLE1 t1
WHERE t1.START_DATE < (
    SELECT MAX(DATE(t2.TIMESTAMP_DATE))
    FROM TABLE2 t2
    WHERE t1.NAME = t2.NAME
);

Here is an example dataset which runs fine with low volumes but my big dataset is running too slow...

Table 1:

NAME START_DATE
JOHN 2018-01-01
JIM 2019-01-01
ALEX 2023-01-01

Table 2:

NAME TIMESTAMP_DATE
JOHN 2022-02-22-09.20.23.891000
JOHN 2022-03-22-09.20.23.891000
JOHN 2022-04-22-09.20.23.891000
JIM 2022-01-22-09.20.23.891000
JIM 2022-02-22-09.20.23.891000
ALEX 2022-05-01-09.20.23.891000

So from my query, I want to extract the follow dataset:

NAME START_DATE
JOHN 2018-01-01
JIM 2019-01-01

Could I please get any tips to optimise for a massive dataset in both tables?


Solution

  • Building a list of names and max dates from T2 and joining is more efficient, try this :

    select
      t1.* 
    from table1 t1
    inner join (
      select 
        name,
        date(max(timestamp_date)) maxdate
      from table2 t2
      group by name
    ) b on b.name = t1.name
    where t1.start_date < b.maxdate