Search code examples
sqlmysqlquery-optimization

Optimizing INNER JOINS SQL query


I have a query like this:

SELECT DISTINCTROW  
    customers.id,
    customers.customer_name  
FROM customers  
    INNER JOIN rentals ON customers.id = rentals.id_customer  
    INNER JOIN rentals_billing ON rentals.id = rentals_billing.id_rental  
    INNER JOIN vehicles ON vehicles.id = rentals.id_vehicle  
WHERE  
    (rentals_billing.customer_id_facsimile IS NULL AND rentals_billing.rental_payable = 1) 
    OR (
       (rentals_billing.missing_rent_id_doc IS NULL OR missing_rent_id_doc <= 0) 
        AND missing_rent_fee > 0
     ) 
    AND customers.delete_date IS NULL
    AND rentals.delete_date IS NULL
    AND vehicles.delete_date IS NULL
    AND (rentals.mode = 'MODEA' OR rentals.mode = 'MODEB' OR rentals.mode = 'MODEC')

Running on a MySQL server 5.6.13 (on Windows Server 2008 R2 SP1) the virtual server is setup with 4Gb of RAM and a 2.1GHz processor (I think single core).

At the present time:

  • table customers has around 3k records,
  • table rentals has around 150k records,
  • table rentals_billing has around 150k records,
  • table vehicles has around 8k records

The query frequently goes in timeout (tooks more than 10-20 seconds).

I tried indexing some of the tables and keys involved in the query but without much luck. There's a way to optimize such query ?

Best regards, Mike


Solution

  • First of all, AND has precedence over OR, so you have

    WHERE
      (customer_id_facsimile IS NULL AND rentals_billing.rental_payable = 1)
    OR 
      (
        (missing_rent_id_doc IS NULL OR missing_rent_id_doc <= 0) AND missing_rent_fee > 0)
        AND customers.delete_date IS NULL
        AND ...
      )
    

    While this may be what you want, I guess you'll rather want

    WHERE
    (
      (customer_id_facsimile IS NULL AND rentals_billing.rental_payable = 1)
     OR 
      ((missing_rent_id_doc IS NULL OR missing_rent_id_doc <= 0) AND missing_rent_fee > 0)
    )
    AND customers.delete_date IS NULL
    AND ...
    

    Then DISTINCTROW (or more commonly just DISTINCT) is very often a sign for a poorly written query where someone joins several tables, produces loads of duplicate rows and then must get rid of the duplicates they created themselves. Such unnecessary big intermediate results can slow down a query a lot. And this is what you are doing here.

    You want to select from customers, but only those that match certain criteria. So why the joins? Select FROM customers and filter in WHERE.

    SELECT *
    FROM customers c
    WHERE c.delete_date IS NULL
    AND EXISTS
    (
      SELECT NULL
      FROM rentals r
      JOIN rentals_billing rb on rb.id_rental = r.id
      JOIN vehicles v on v.id = r.id_vehicle AND v.delete_date IS NULL
      WHERE r.id_customer = c.id
      AND r.delete_date IS NULL
      AND r.mode IN ('MODEA', 'MODEB', 'MODEC')
      AND 
      (
        (???.customer_id_facsimile IS NULL AND rb.rental_payable = 1)
       or 
        ((rb.missing_rent_id_doc IS NULL OR rb.missing_rent_id_doc <= 0) AND ???.missing_rent_fee > 0)
      )
    );  
    

    I've put ??? where I don't know which table the column belongs to.

    To speed up the lookup you should have this index:

    CREATE INDEX idx ON rentals (id_customer, mode, delete_date, id_vehicle);
    

    Or, if most rentals are deleted:

    CREATE INDEX idx ON rentals (id_customer, delete_date, mode, id_vehicle);
    

    Well, important is that id_customer comes first, because this is the column for which the lookup shall be made. These indexes are called covering indexes, because they contain all the columns you are using in the query. The first three are for the quick lookup, the id_vehicle then is for the join to vehicles, because you want to ensure that the rental is on a non-deleted vehicle. The rental ID that you need for the join to rentals_billing is silently included in the indexes in MySQL. In other DBMS you would have to explicitely add it.