I have a query that is taking 48 seconds to execute as follows:
SELECT count(DISTINCT tmd_logins.userID) as totalLoginsUniqueLast30Days
FROM tmd_logins
join tmd_users on tmd_logins.userID = tmd_users.userID
where tmd_users.isPatient = 1 AND loggedIn > '2011-03-25'
and tmd_logins.userID in
(SELECT userID as accounts30Days FROM tmd_users
where isPatient = 1 AND created > '2012-04-29' AND computerID is null)
When I remove the DISTINCT
keyword it takes less than 1 second, so it seems the bottle neck lies within that.
The database adds an entry to the tmd_logins
table every time a user logs into the system. I am trying to get a total count of all users that are patients that have been created and logged in within a given period of time, such as in the last 30 days.
I have tried removing the DISTINCT keyword and adding group by tmd_logins.userID
to the statement but the performance issue remains.
Table tmd_logins
has about 300,000 records, tmd_users
has about 40,000
Is there a better way of doing this?
The problem that you have is the execution plan. My guess is that the "in" clause might be confusing it. You might try:
SELECT count(DISTINCT tmd_logins.userID) as totalLoginsUniqueLast30Days
FROM tmd_logins join
tmd_users
on tmd_logins.userID = tmd_users.userID join
(SELECT distinct userID as accounts30Days
FROM tmd_users
where isPatient = 1 AND
created > '2012-04-29' AND
computerID is null
) t
on tmd_logins.userID = t.accounts30Days
where tmd_users.isPatient = 1 AND
loggedIn > '2011-03-25'
That might or might not work. However, I'm wondering about the structure of the query itself. It would seem that UserID should be distinct in a table called tmd_users. If so, then you can wrap all your conditions into one:
SELECT count(DISTINCT tmd_logins.userID) as totalLoginsUniqueLast30Days
FROM tmd_logins join
tmd_users
on tmd_logins.userID = tmd_users.userID
where tmd_users.isPatient = 1 AND
loggedIn > '2011-03-25' and
created > '2012-04-29' AND
computerID is null
If my guess is true, then this should definitely run faster.