Search code examples
performanceindexingmysql-5.7right-join

MySQL right join slow performance


I have two tables:

Restaurants and SurveyInvitation.

One restaurants has many survey invitation.

I want to select all the restaurants that have a survey invitation and the status of it is 'approved', 'completed', 'hidden_review' .

The restaurants table has ~1400 rows and the survey invitation ~2.4 milion rows.

This is my query

SELECT  `Restaurant`.`id`
FROM   `restaurants` AS `Restaurant`
RIGHT JOIN `survey_invitations` AS `SurveyInvitations`
           ON ( `SurveyInvitations`.`restaurant_id` = `Restaurant`.`id`
                AND `SurveyInvitations`.`status` 
                IN (
                    'approved', 'completed', 'hidden_review'
                   ) 
               )
WHERE  `Restaurant`.`country_id` = 53
AND `Restaurant`.`area_id` IN ( 1, 16, 27, 118,
                                   219, 221, 222, 223,
                                   224, 225, 230, 231,
                                   235, 236, 237, 238,
                                   239, 240, 248, 226,
                                   241, 244, 246, 227,
                                   245, 228, 229, 242,
                                   243, 249 )

group by `Restaurant`.`id`

This runs in 1.235 sec.

Running explain gives

https://jsfiddle.net/bjuepb9j/3

I also tried this but no luck still 1.2 sec

SELECT  `Restaurant`.`id`
FROM   `db_portal`.`restaurants` AS `Restaurant`
RIGHT JOIN  (
    select `restaurant_id` from `survey_invitations` AS `SurveyInvitations`
    where `SurveyInvitations`.`status` 
    IN ('approved', 'hidden_review', 'completed')
)  AS `SurveyInvitations`
ON (
`SurveyInvitations`.`restaurant_id` = `Restaurant`.`id`
)
WHERE  `Restaurant`.`country_id` = 53
AND `Restaurant`.`area_id` IN ( 1, 16, 27, 118,
                                   219, 221, 222, 223,
                                   224, 225, 230, 231,
                                   235, 236, 237, 238,
                                   239, 240, 248, 226,
                                   241, 244, 246, 227,
                                   245, 228, 229, 242,
                                   243, 249 )

group by `Restaurant`.`id`

Explain is the same.

In the fiddle there is also the result from show index on both of the tables.

1.2 sec for a ~2.4 million rows is to much I think. Maybe the indexes are wrong, I'm not that good at this kind of stuff.

Edit.1. https://jsfiddle.net/bjuepb9j/6/

Has show create table and show columns of survey_invitations


Solution

  • Use exists:

    SELECT r.id
    FROM restaurants r
    WHERE r.country_id = 53 AND
          r.area_id IN (1, 16, 27, 118, 219, 221, 222, 223,
                        224, 225, 230, 231, 235, 236, 237, 238,
                        239, 240, 248, 226, 241, 244, 246, 227,
                        245, 228, 229, 242, 243, 249
                       ) AND
          EXISTS (SELECT 1
                  FROM survey_invitations si
                  WHERE si.restaurant_id = r.id AND
                        si.status IN ('approved', 'completed', 'hidden_review') 
                 );
    

    Then, for this query you want indexes on restaurants(country_id, area_id, id) and survey_invitations(restaurant_id, status).

    A right join is entirely unneeded for your query. The where clause turn it into an inner join anyway. In all likelihood, the expense of the query is in the group by. This version eliminates that.