Search code examples
mysqlmariadbquery-optimization

How can I make sorting by a column in a joined table faster?


I've got a website with a bunch of products, each of which is scored using a version of an algorithm and when listed, is joined on the following table:

CREATE TABLE `product_score` (
  `product_id` int(10) UNSIGNED NOT NULL,
  `cohort_id` int(10) UNSIGNED NOT NULL,
  `score` float UNSIGNED NOT NULL,
  `last_updated` int(10) UNSIGNED NOT NULL DEFAULT 1709444313,
  `order` int(10) UNSIGNED NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

ALTER TABLE `product_score`
  ADD PRIMARY KEY (`product_id`,`cohort_id`),
  ADD KEY `score` (`score`),
  ADD KEY `order` (`order`),
  ADD KEY `product_id` (`product_id`,`cohort_id`,`order`);

There are roughly 20k products, each of which has a unique "order" (0-20000) depending on the cohort_id which indicates the version of the algorithm that the entry is for.

An example problematic query:

SELECT product.*, user.*, additional_data.* FROM product
INNER JOIN product_score AS Score ON Score.product_id = product.product_id AND Score.cohort_id = '24') 
LEFT JOIN user ON user.user_id = product.user_id
LEFT JOIN additional_data...
ORDER BY Score.order LIMIT 20

It is snappy. However, with the order by, it takes 1-2 seconds.

I don't see any room for improvement as far as additional indexing. I've run an EXPLAIN on the query, and the product_id key is being properly utilized in the join as well.

Something I noticed is that when removing the left joins, the query only takes 0.2 seconds instead of 1 (still slow compared to without the order by, but not nearly as slow). However, that data is necessary for queries, so I can't simply remove it. I'm also not knowledgable enough about DB io to infer what it means or how best to resolve it.

Here's the profiling information for an example query: Profiling information


Solution

  • Reorder INDEX: Replace

     ADD KEY `product_id` (`product_id`,`cohort_id`,`order`);
    

    With

    INDEX(cohort_id, `order`, product_id)
    

    Turn query inside out: Often when using GROUP BY or ORDER BY LIMIT nn on a JOIN, a performance improvement is to push such into a "derived table" (FROM ( SELECT ... )) that generates only a few rows (cf LIMIT or GROUPing), then do the JOINs to find other desired columns. Try to write the derived table with as little effort as possible (minimal or no JOINs, perhaps delivering only ids, etc). If necessary, pick up other columns by JOINing back to the inner table; etc.

    If the does not speed it up enough, rewrite the query to find the 20 rows in a subquery before looking at the other columns or tables:

    SELECT product.*, user.*, additional_data.*
        FROM ( SELECT product_id, `order`
                FROM  Score.cohort_id = '24'
                ORDER BY  Score.order
                LIMIT  20 ) AS s1
        JOIN product USING(product_id)
        LEFT JOIN user USING(user_id)
        LEFT JOIN  additional_data...
        ORDER BY  s1.order
    

    (In case you had not noticed, "profiling" does not provide any actionable information.)