I'm new to query optimization so I've read a lot of articles and watched YouTubes and stuff but I still can't figure it out.
The thing is I have this query that I want to execute for my API :
SELECT req_id,follow_requests.insta_id as id,caption,follow_requests.male,
users.name,users.profile_pic
FROM follow_requests
INNER JOIN users ON follow_requests.insta_id = users.insta_id
WHERE (remaining_follow>0 && req_id<='574989')
AND NOT EXISTS
(
SELECT *
FROM follows
WHERE follows.req_id=follow_requests.req_id
AND follows.follower_insta_id='3134816134'
)
ORDER BY req_id DESC
LIMIT 20
so I logged my database and digest showed that this one is eating +42% of my resources some instance of this query look like 300 sec and it had an average of 20 sec!, (they got stuck on sending data status) so I tried to change it for better (not that I really I know what I'm doing :) and I changed it to this
SELECT fr.req_id, fr.insta_id as id, fr.caption, fr.male, u.name,
u.profile_pic, fr.date, fr.remaining_follow
FROM follow_requests AS fr
INNER JOIN users AS u ON fr.insta_id=u.insta_id
WHERE fr.remaining_follow>0
AND fr.insta_id!=3134816134
AND NOT EXISTS
(
SELECT *
FROM follows as f
WHERE f.req_id=fr.req_id
AND f.follower_insta_id='3134816134'
)
ORDER BY fr.date DESC
LIMIT 20
the strange thing is, it was ok the first few runs! like from 0.1 sec till 0.7s on test runs but after I went to stretch a bit and came back to finalize it, this was also stuck for +200,300 sec but on sorting status this time I checked query with explain and I noticed it's not using date index! if only I remove 'order by' part from query it becomes really fast ~0.2s
any Ideas?
by the way, almost all columns are indexed and follow table has 2M rows other have a like +100k
sorry If I couldn't explain it well or I missed to give you some important info, I'm just new ;)
I would be very grateful if you could explain it in some way a newbie can understand while not missing out important details :)
Thanks in advance
=-=-=-=-=-=-= Update 1 =-=-=-=-=-=-=
Thanks to all your help I reached this query now which is time efficient but I had to change the whole thing a little but it might still have a problem
SELECT follow_requests.req_id, follow_requests.insta_id as id,
caption, follow_requests.male, users.name,users.profile_pic
FROM follow_requests
INNER JOIN users ON follow_requests.insta_id = users.insta_id
LEFT OUTER JOIN follows ON follows.req_id = follow_requests.req_id
AND follows.follower_insta_id = '3134816134'
WHERE follow_requests.remaining_follow >0
AND follow_requests.insta_id != 3134816134
AND (follows.follower_insta_id != 3134816134 || follows.follower_insta_id is null)
ORDER BY follow_requests.date DESC
LIMIT 20
hmm, first i need to say that i didnt write the orginal query (first 1) and i dont understand all of it myself :/ but the main programmer is out of reach right now and i have to take his place or the whole project will fail
about the problem is everybody can submit multiple requests for an insta_id but should only return one of them the main code and @Kickstart code (which is optimized version of the original one with ave respond time of ~10 sec) does it right (for some reason I don't understand but my code returns multiple instances of the same insta_id (with different req_id)
And alsocould some body enlighten me why the first query is doing so bad?
Really need to see the table declares and the EXPLAINs of the queries.
However it is possibly worth trying a LEFT OUTER JOIN of the follows table, and checking for no match (by checking for a column that must be populated on that table for being NULL).
However do you have an index on the follows table that covers both the req_id AND the follower_insta_id fields? Not 2 separate indexes, but one that covers both fields.
SELECT req_id,
follow_requests.insta_id as id,
caption,
follow_requests.male,
users.name,users.profile_pic
FROM follow_requests
INNER JOIN users ON follow_requests.insta_id = users.insta_id
LEFT OUTER JOIN follows ON follows.req_id = follow_requests.req_id AND follows.follower_insta_id = '3134816134'
WHERE follows.req_id IS NULL
AND remaining_follow > 0
AND req_id <= '574989'
ORDER BY req_id DESC
LIMIT 20