Search code examples
mysqlquery-optimization

Should I avoid ORDER BY in queries for large tables?


In our application, we have a page that displays user a set of data, a part of it actually. It also allows user to order it by a custom field. So in the end it all comes down to query like this:

SELECT name, info, description FROM mytable
WHERE active = 1 -- Some filtering by indexed column
ORDER BY name LIMIT 0,50; -- Just a part of it

And this worked just fine, as long as the size of table is relatively small (used only locally in our department). But now we have to scale this application. And let's assume, the table has about a million of records (we expect that to happen soon). What will happen with ordering? Do I understand correctly, that in order to do this query, MySQL will have to sort a million records each time and give a part of it? This seems like a very resource-heavy operation.

My idea is simply to turn off that feature and don't let users select their custom ordering (maybe just filtering), so that the order would be a natural one (by id in descending order, I believe the indexing can handle that).

Or is there a way to make this query work much faster with ordering?

UPDATE:

Here is what I read from the official MySQL developer page.

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

....

The key used to fetch the rows is not the same as the one used in the ORDER BY:

SELECT * FROM t1 WHERE key2=constant ORDER BY key1;

So yes, it does seem like mysql will have a problem with such a query? So, what do I do - don't use an order part at all?


Solution

  • The 'problem' here seems to be that you have 2 requirements (in the example)

    • active = 1
    • order by name LIMIT 0, 50

    The former you can easily solve by adding an index on the active field The latter you can improve by adding an index on name

    Since you do both in the same query, you'll need to combine this into an index that lets you resolve the active value quickly and then from there on fetches the first 50 names.

    As such, I'd guess that something like this will help you out:

    CREATE INDEX idx_test ON myTable (active, name)
    

    (in theory, as always, try before you buy!)

    Keep in mind though that there is no such a thing as a free lunch; you'll need to consider that adding an index also comes with downsides:

    • the index will make your INSERT/UPDATE/DELETE statements (slightly) slower, usually the effect is negligible but only testing will show
    • the index will require extra space in de database, think of it as an additional (hidden) special table sitting next to your actual data. The index will only hold the fields required + the PK of the originating table, which usually is a lot less data then the entire table, but for 'millions of rows' it can add up.
    • if your query selects one or more fields that are not part of the index, then the system will have to fetch the matching PK fields from the index first and then go look for the other fields in the actual table by means of the PK. This probably is still (a lot) faster than when not having the index, but keep this in mind when doing something like SELECT * FROM ... : do you really need all the fields?
    • In the example you use active and name but from the text I get that these might be 'dynamic' in which case you'd have to foresee all kinds of combinations. From a practical point this might not be feasible as each index will come with the downsides of above and each time you add an index you'll add supra to that list again (cumulative).

    PS: I use PK for simplicity but in MSSQL it's actually the fields of the clustered index, which USUALLY is the same thing. I'm guessing MySQL works similarly.