Search code examples
mysqlsubquerycorrelated-subqueryin-subquery

Can Someone Help me Optimize this mysql statement?


I have one table that I'm using to build groups with in my database. The table contains a list of group names and ids. I have another table that has users, and a third table showing the relationships. (userid, groupid).

The situation is this, I need to create a list of userids that belong to a specific subset of groups. So for instance, I want all users that are in group 1, 3, and 8. That is straight forward enough. It gets more complicated though, I may need a list of all users that are in groups 1, 3, and 8, or 1, 2, and 8. Then I might need to exclude users that fit that criteria, but are also in group 27.

So I've got a script dynamically creating a query, using sub queries that works to a point. I have two problems with it. I don't think I'm handling the not-in part properly, because as I ad criteria, eventually, it just kinda hangs. (I think this is a result of me using sub-selects instead of joins, but I could not figure out how to build this with joins.)

Here is an example of a query with 4 ANDed OR groups, and 2 NOT clauses.

Please let me know if there is a better way to optimize this stmt. (I can handle the dynamic building of it in PHP)

If I need to clarify anything or provide more details, let me know.


select * from users_table where username IN
(
    select user_id from
    (
        select distinct user_id from group_user_map where user_id in 
        (
            select user_id from 
            (
                select * from 
                (
                    select count(*) as counter, user_id from  
                    (
                        (
                            select distinct(user_id) from group_user_map where group_id in (2601,119)
                        ) 
                        union all
                        (
                            select distinct(user_id) from group_user_map where group_id in (58,226)
                        ) 
                        union all
                        (
                            select distinct(user_id) from group_user_map where group_id in (1299,525)
                        ) 
                        union all
                        (
                            select distinct(user_id) from group_user_map where group_id in (2524,128)
                        ) 
                    ) 
                    thegroups group by user_id
                ) 
                getall where counter = 4
            ) 
            getuserids
        ) 
        and user_id not in 
        (
            select user_id from group_user_map where group_id in (2572)
        ) 
    ) 
    biggergroup 
);

Note, the first part of the query is comparing an id to a username. This is because I have the usernames stored as id's from the other table. (This whole thing is a link between two completely different databases).

(Also, if it looks like I have any extra sub-queries, that was to try to force mysql to evaluate the inner queries first.)

Thanks.

Aaron.


Solution

  • Avoiding subselects used for IN clauses:-

    SELECT * 
    FROM users_table
    INNER JOIN 
    (
        SELECT Sub1.user_id 
        FROM (
                SELECT COUNT(*) AS counter, user_id   
                FROM (
                    SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)
                    UNION ALL
                    SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)
                    UNION ALL
                    SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)
                    UNION ALL
                    SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)
                ) thegroups
                GROUP BY user_id
                HAVING counter = 4
        ) Sub1
        LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
        ON group_user_map.user_id = Sub2.user_id
        WHERE Sub2.user_id IS NULL
    ) Sub3
    ON  users_table.username = Sub3.user_id
    

    Or avoiding using the COUNTs to check that the user id exists in all 4 tables, instead using inner joins

    SELECT * 
    FROM users_table
    INNER JOIN 
    (
        SELECT Sub1.user_id 
        FROM (
            SELECT z.user_id   
            FROM (
                SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z
                INNER JOIN
                (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y ON z.user_id = y.user_id
                INNER JOIN
                (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x ON z.user_id = x.user_id
                INNER JOIN
                (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w ON z.user_id = w.user_id
        ) Sub1
        LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
        ON group_user_map.user_id = Sub2.user_id
        WHERE Sub2.user_id IS NULL
    ) Sub3
    ON  users_table.username = Sub3.user_id
    

    Cleaning up that 2nd query a bit

    SELECT * 
    FROM users_table
    INNER JOIN 
    (
        SELECT z.user_id   
        FROM (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z
        INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y 
        ON z.user_id = y.user_id
        INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x 
        ON z.user_id = x.user_id
        INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w 
        ON z.user_id = w.user_id
        LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
        ON z.user_id = Sub2.user_id
        WHERE Sub2.user_id IS NULL
    ) Sub3
    ON  users_table.username = Sub3.user_id
    

    Using your SQL in the comment below, it can be cleaned up to :-

    select SQL_NO_CACHE id 
    from users_table 
    INNER JOIN ( SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (0, 67) ) ij1 
    ON users_table.username = ij1.user_id 
    LEFT OUTER JOIN ( SELECT user_id FROM group_user_map WHERE group_id IN (0) ) Sub2 
    ON users_table.username = Sub2.user_id 
    WHERE Sub2.user_id IS NULL 
    

    Cleaning up my SQL in the same way:-

    SELECT users_table.* 
    FROM users_table
    INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z ON users_table.username = z.user_id
    INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y ON users_table.username = y.user_id
    INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x ON users_table.username = x.user_id
    INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w ON users_table.username = w.user_id
    LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2 ON users_table.username = Sub2.user_id
    WHERE Sub2.user_id IS NULL
    

    Removing the subselects and doing the joins directly (might help or hinder, suspect it will depend on how many duplicate user_id records there are for each set of group_id records)

    SELECT DISTINCT users_table.* 
    FROM users_table
    INNER JOIN group_user_map z ON users_table.username = z.user_id AND z.group_id IN (2601,119)
    INNER JOIN group_user_map y ON users_table.username = y.user_id AND y.group_id IN (58,226)
    INNER JOIN group_user_map x ON users_table.username = x.user_id AND x.group_id IN (1299,525)
    INNER JOIN group_user_map w ON users_table.username = w.user_id AND w.group_id IN (2524,128)
    LEFT OUTER JOIN group_user_map Sub2 ON users_table.username = Sub2.user_id AND Sub2.group_id IN (2572)
    WHERE Sub2.user_id IS NULL