Search code examples
mysqlmatchsubsetgreatest-n-per-groupclosest

MySQL select closest match within a subset of MySQL results


Suppose I have the following MySQL (InnoDB) table:

+-------------------------------------------------------------------+
|                          table "taskList"                         |
+----+------------------+----------+---------------------+----------+
| ID | TaskName         | Category | Date_time           | Priority |
+----+------------------+----------+---------------------+----------+
| 1  | cleanup          |   system | 2019-06-02 03:30:00 |        5 |
+----+------------------+----------+---------------------+----------+
| 2  | create_user      |   system | 2019-03-23 11:56:10 |        5 |
+----+------------------+----------+---------------------+----------+
| 3  | send_invoice     |   system | 2019-03-23 11:56:17 |        6 |
+----+------------------+----------+---------------------+----------+
| 4  | perform_selftest |   system | 2019-06-25 06:54:11 |        1 |
+----+------------------+----------+---------------------+----------+
| 5  | add_destination  |      map | 2019-02-15 16:21:04 |        2 |
+----+------------------+----------+---------------------+----------+
| 6  | verify_VIN       |  chassis | 2019-01-04 09:35:49 |        5 |
+----+------------------+----------+---------------------+----------+


I want to write a query selecting any records that match all of the following conditions (note that values between quotes are examples and will be parameterized in real world):

  • Category 'system'
  • Date_time between '2019-01-01' and '2019-07-01'
  • having the highest Priority in this subset, closest to '2' (2 itself is allowed, but not anything higher, like 1)

In this case, there are 4 records matching the first two conditions. But of those 4, only 2 match the condition for Priority. So here records #1 and #2 (and only those) should be returned.


By searching this website I've written the following query which seems to work, but it's quite ugly and I have a feeling that it could be more efficient in terms of performance:

SELECT * FROM taskList
WHERE category='system'
AND (Date_time BETWEEN '2019-01-01' AND '2019-07-01') 
AND Priority=(
   SELECT MIN(Priority) FROM taskList
   WHERE category='system'
   AND (Date_time BETWEEN '2019-01-01' AND '2019-07-01') 
   AND Priority >= 2
)
ORDER BY Date_time DESC

Does anyone know of a better way to do this?


Solution

  • Your query, that uses a correlated subquery for filtering, seems to properly fit your use case, and should have pretty good performance, especially with an index on (category, Date_time, Priority).

    If you are running MySQL 8.0, you can also try and use rank(). This gives you a more concise query (you would need to compare performance against your original query):

    SELECT *
    FROM (
            SELECT 
                t.* ,
                RANK() OVER(PARTITION BY category ORDER BY Priority) rn
            FROM taskList t
            WHERE 
                Category = 'system'
                AND Date_time BETWEEN '2019-01-01' AND '2019-07-01') 
                AND Priority >= 2
    ) t
    WHERE rn = 1
    

    NB: I added the Category to the rank partition, in case you ever need to make the query take in account more than one category.