Search code examples
sqlleft-joinphpbb

sql queries slower than expected


Before I show the query here are the relevant table definitions:

CREATE TABLE phpbb_posts (
    topic_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
    poster_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
    KEY topic_id (topic_id),
    KEY poster_id (poster_id),
);


CREATE TABLE phpbb_topics (
    topic_id mediumint(8) UNSIGNED NOT NULL auto_increment
);

Here's the query I'm trying to do:

SELECT p.topic_id, p.poster_id 
FROM phpbb_topics AS t 
LEFT JOIN phpbb_posts AS p 
   ON p.topic_id = t.topic_id 
      AND p.poster_id <> ... 
WHERE p.poster_id IS NULL;

Basically, the query is an attempt to find all topics where the number of times someone other than the target user has posted in is zero. In other words, the topics where the only person who has posted is the target user.

Problem is that query is taking a super long time. Here's the EXPLAIN for it:

Array
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => t
    [type] => index
    [possible_keys] =>
    [key] => topic_approved
    [key_len] => 1
    [ref] =>
    [rows] => 146484
    [Extra] => Using index
)
Array
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => p
    [type] => ref
    [possible_keys] => topic_id,poster_id,tid_post_time
    [key] => tid_post_time
    [key_len] => 3
    [ref] => db_name.t.topic_id
    [rows] => 1
    [Extra] => Using where; Not exists
)

My general assumption when it comes to SQL is that JOINs of any are super fast and can be done in no time at all assuming all relevant columns are primary or foreign keys (which in this case they are).

I tried out a few other queries:

SELECT COUNT(1) 
    FROM phpbb_topics AS t 
    JOIN phpbb_posts AS p 
        ON p.topic_id = t.topic_id;

That returns 353340 pretty quickly.

I then do these:

SELECT COUNT(1) 
    FROM phpbb_topics AS t 
    JOIN phpbb_posts AS p 
        ON p.topic_id = t.topic_id
            AND p.poster_id <> 77198;

SELECT COUNT(1) 
    FROM phpbb_topics AS t 
    JOIN phpbb_posts AS p 
        ON p.topic_id = t.topic_id
    WHERE p.poster_id <> 77198;

And both of those take quite a while (between 15-30 seconds). If I change the <> to a = it takes no time at all.

Am I making some incorrect assumptions? Maybe my DB is just foobar'd?


Solution

  • SELECT t.topic_id 
    FROM phpbb_topics AS t 
    JOIN phpbb_posts AS p1
       ON p1.topic_id = t.topic_id
          AND p1.poster_id = $poster_id
    LEFT JOIN phpbb_posts AS p2 
       ON p2.topic_id = t.topic_id 
          AND p2.poster_id <> $poster_id
    WHERE p2.poster_id IS NULL
    

    That made it a ton faster. I'm getting all the posts where the target user has posted with the topic info attached to that and then getting all the people other than the target who've posted.

    There'll be lots of duplicates in the p1.poster_id column but since I'm not actually getting that row I figure duplicates in that column don't matter a whole lot.

    Thanks!