Being pretty new to database optimization in systems with high load I have the following question - lets assume we have the following query (query is with sample data):
SELECT *
FROM ticket
WHERE ticket_status='draft'
AND user_id='789437879'
ORDER BY ticket_id DESC LIMIT 0, 15
We already have the following indizes:
CREATE INDEX ticket_status on ticket(ticket_status);
CREATE INDEX user_id on ticket(user_id);
CREATE INDEX ticket_id on ticket(ticket_id);
Would there be a significant performance benefit for optimizing this query if we do the following:
CREATE INDEX make_that_query_more_efficient on ticket(user_id,ticket_status);
Or does it make barely any difference as all the columns are indexed anyways?
It depends on the query. But definitely in your example.
In many queries, it makes a huge difference.
Here is one discussion on such: http://mysql.rjweb.org/doc.php/index1 . It discusses various indexing strategies on a simple query. The conclusion is that a composite index is clearly optimal for the query.
Meanwhile, there are dozens, maybe hundreds, of examples in this forum where my solution to a performance problem (high CPU, slow query, etc) is primarily to replace a single-column index with a multi-column index. The performance gain is sometimes spectacular when WHERE
, ORDER BY
, and LIMIT
can all be handled by the INDEX
.
A many-to-many mapping table ('junction') is a very common schema pattern that novices fail to index properly. More on it: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
For your query, this would make a world of difference:
INDEX(user_id, ticket_status, -- these two can be in either order
ticket_id) -- this needs to be last
The execution will quickly drill into the Index's BTree to the row(s) with ticket_status='draft' AND user_id='789437879'
. It will start at the end of such items and scan backward (DESC
), picking up 15 (or fewer) items. Then it will look up the other columns (*
) and deliver them.
Almost any other index would require scanning more than just 15 items.
As for your indexes.
If ticket_id
is already the PRIMARY KEY
, do not add INDEX(ticket_id)
; it will be useless.
If any of your indexes is a prefix of the index I recommend, DROP it; it will be redundant and in the way. (Use EXPLAIN SELECT
to see this.)
If the Optimizer picked your (ticket_status)
, it would look at all the entries with desired status, filter based on user_id
, sort the results, then peel off 15 rows.
Similarly for (user_id)
If the Optimizer were to use INDEX(ticket_id)
, it would start at the end of the ids and work backward. If there were not 15 relevant rows, it would not stop until it scanned the entire table.
Note that my composite index even avoids the sort.
The rest of the indexes may or may not be useful; it depends on whether other queries can use them.
A suitable index is likely to be much more beneficial to a SELECT
than a burden on INSERTs
; so don't worry about this tradeoff.
Starting with user_id
may be useful to other queries; starting with status
seems less likely. The single-column INDEX(ticket_status)
is unlikely to be used by any query due to "cardinality".
My 3-column index is likely to be much better than your similar 2-column index. Mine takes care of the ORDER BY
and the LIMIT
; yours needs to gather lots of rows and sort them.
If there are big TEXT
columns in the *
, the performance difference may be more dramatic.