Search code examples
mysqlduplicatesleft-joinintervalsdateadd

How to join two tables based on date? MYSQL


I'm trying to join two tables that have check date in both. The results should show only if the check date in the first table is equal to the check date in the second table. OR if the check date in both tables is within 10 days of each other.

I have two tables:

t1 table: company_id, expected_checkdate

t2 table: company_id, actual_checkdate
  • Condition 1: acutal_checkdate = expected_checkdate
  • Condition 2: acutal_checkdate within 10 days of expected_checkdate
  • Condition 3: expected_checkdate within 10 days of acutal_checkdate
  • Condtion 4: if acutal_checkdate = expected_checkdate then don't check for others that are within 10 days

      LEFT OUTER JOIN t2 ON t1.actual_checkdate = t2.expected_checkdate
           OR t1.actual_checkdate 
           BETWEEN DATE_ADD(t2.expected_checkdate, INTERVAL -10 DAY) 
             AND DATE_ADD(t2.expected_checkdate, INTERVAL 10 DAY)
           AND t1.company_id = t2.company_id
    

The problem is when I run this for one month. I'm seeing a lot of duplicates because there can be 2 actual_checkdate entries or 2 expected_checkdate in one month.

      |---------------------|------------------|------------------|
      |     company_id      | actual_checkdate |expected_checkdate|
      |---------------------|------------------|------------------|
      |          12         |    2018-01-05    |    2018-01-05    |
      |---------------------|------------------|------------------|
      |          12         |    2018-01-19    |    2018-01-19    |
      |---------------------|------------------|------------------|
      |          12         |    2018-01-05    |    2018-01-19    | -- incorrect 
      |---------------------|------------------|------------------|
      |          12         |    2018-01-19    |    2018-01-05    | -- incorrect 
      |---------------------|------------------|------------------|
      |          13         |    2018-01-12    |    2018-01-20    | 
      |---------------------|------------------|------------------|
      |          14         |    2018-01-26    |    2018-01-36    | 
      |---------------------|------------------|------------------|

The first two and last two rows are correct. The third and fourth rows shouldn't display because they are duplicates of the first and second row. Please help me with the join above.


Solution

  • You could rewrite your JOIN condition as:

    FROM t1
    JOIN t2 ON t1.company_id = t2.company_id AND
       (t1.actual_checkdate = t2.expected_checkdate OR
        t1.actual_checkdate BETWEEN 
           DATE_SUB(t2.expected_checkdate, INTERVAL 10 DAY) 
           AND DATE_ADD(t2.expected_checkdate, INTERVAL 10 DAY)
        AND NOT EXISTS (SELECT * FROM t1 WHERE t1.actual_checkdate = t2.expected_checkdate AND t1.company_id = t2.company_id)
     )
    

    The NOT EXISTS clause will prevent matching on a nearby date if an actual match exists.