I have a table line_item { id: int, price: decimal, quantity: int, [other:...] }
. This table is very huge, approx. 28 million rows. Now I want to get top 1000 rows order by f(price, quantity, [other...])
, f
is a arbitrary function. What is the best way to do that?
I thought 2 solutions:
order by
and limit
. This way may be slow because I think MySQL calculates result of f
for every row then sorts them.f
. This way is not good for scalability because maybe I want to use multiple function f
(f1
, f2
...) in different context.I really hope there is third solution which is better than them.
(Sorry, this is a negative answer, but that's life.)
If you will accept the "best solution" being only twice as fast as what you have experienced, then Accept @Zsuzsa's.
I'm here to tell you that it cannot be optimized without doing something about f(...). Here's why:
The optimizer sees no WHERE clause, but sees an ORDER BY with an expression. So, it realizes that the only way to evaluate the query is to do a "table scan" (that is, read all the rows), evaluate the function for each row, save the results in a tmp table (with 28M rows), sort that tmp table, and deliver 1000 rows.
Can any of that function be copied into the WHERE clause for filtering out some rows? If so, the tmp table might be smaller. Or, if you are lucky, perhaps some INDEX can be devised so that it does not have to do a full table scan.
Are you modifying all the rows? Or is this sort of a "write only" table? That is, does a row, once written, never change? Building on that, can f() pre-computed for all the 'old' rows? If so, store it somewhere and add an index -- Poof! Instant results.
Is a common part of f() a test for some date range? (Big tables often have some kind of date. Queries on big tables often ask about "recent" items.) If so, can that be pulled out of f(). Then we can consider PARTITIONing the table by date. That way, even if nothing else can be optimized in f, "partition pruning" could limit the number of rows to deal with.
Please SHOW CREATE TABLE and discuss whether some of the ideas here might be feasible.