Search code examples
mysqlsqlmariadbsubquerysql-order-by

Why executes MariaDB all subqueries in select-statement before order by keyword, even if they are not necessary?


we switched our database from mySQL8 to MariaDB10 a week ago and now we have massive performance problems. We figured out why: we are working with subqueries in select statements and ORDER BY pretty often. Here is an example:

SELECT id, (SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = table.id) queryResult
FROM table
WHERE status = 5
ORDER BY column
LIMIT 10

imagine, there are 1.000.000 entries in table which are affected if status = 5.

What happens in mySQL8: ORDER BY and LIMIT execute and after that the subquery (10 rows affected)

What happens in MariaDB10: the subquery executes (1.000.000 rows affected) and after that ORDER BY and LIMIT

Both queries are returning 10 rows but under MariaDB10 it is incredible slow because of that. Why is this happing? And is there an option in MariaDB which we should activate to avoid this? I know from mySQL8 that select subqueries will be executed when they are mentioned in ORDER BY. But if not they will be executed when the resultset is there.

Info: if we do this, everything is fine:

SELECT *, (SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = outerTable.id) 
FROM (
    SELECT id
    FROM table
    WHERE status = 5
    ORDER BY column
    LIMIT 10
) outerTable

Thank you so much for any help.


Solution

  • After searching and searching I finally found a solution to make the mariaDB10 database working as I knew it from mySQL8.

    For those which have similar problems: set this each time you connect to the server and everything works like in mySQL8:

    SET optimizer_use_condition_selectivity = 1
    

    Long version: the problem I described at the top was suddenly solved and the subquery was executed like it was in the past under mySQL8. I did exactly nothing!

    But there were soon new problems: we have a statistic page, which was incredible slow. I noticed that an index was missing and I add it. I executed the query and it was working. Without index 100.000 rows affected for finding the results, after adding 38. Well done.

    Then strange things started to happen: I executed the query again and the database didn't use the index. So I executed it again and again. This was the result:

    1st query execution (I did it with ANALYZE): 100.000 rows affected

    2nd query execution: 38 rows affected

    3rd query execution: 38 rows affected

    4th query execution: 100.000 rows affected

    5th query execution: 100.000 rows affected

    It was complete random, even in our SaaS solution! So I startet to search how the optimizer determine an execution plan. I found this: optimizer_use_condition_selectivity

    the default for mariaDB10.4 server is 4 which means, that histograms are used to calculate the result set. I saw a few videos about it and recognized that this will not work in our case (although we stuck to database normalization). Mode 1 works well:

    use selectivity of index backed range conditions to calculate the cardinality of a partial join if the last joined table is accessed by full table scan or an index scan

    I hope this will help some other guys which despair with this like I did.