Search code examples
mysqlsqldatabasejoininner-join

INNER JOIN table on itself


I have a table which represents friendships between two users. Each entry is unidirectional; a friendship requires two entries to be represented. I would like to keep it this way.

user1_id | user2_id 
   43        44
   44        43

I'm curious what the best way to query against this kind of setup is. For example, how would query for a list of all friends of a particular user?

The solution I've come up with is to invert and inner join the friendships table onto itself, to get a list of all complete pairs first, then use a normal WHERE clause:

SELECT f.user2_id 
    FROM friendships f
        INNER JOIN friendships f2
            ON f.user1_id = f2.user2_id 
            && f.user2_id = f2.user1_id
        WHERE f.user1_id = 43;

To get into a juicier situation, I would also need to be able to query to get a list of threads created by friends of a user. I'm still adjusting to the abstract thinking required to work with joins, but the solution I've come up with is this:

SELECT thread_id,owner_id,message,time  
    FROM threads th                             
        INNER JOIN friendships f                                
            ON th.owner_id = f.user2_id                             
            && f.user1_id = 43                           
        INNER JOIN friendships f2                               
            ON f.user2_id = f2.user1_id                             
            && f.user1_id = f2.user2_id

This seems to work, but I'm testing it on a very empty database. Since my grasp on joins is still a little weak, I'm afraid that there are situations in which this will come back with bad results. So my questions are:

  1. Are these two queries behaving as I expect them to?
  2. Is there a better way to go about this?

Solution

  • Let's start to answer this question by creating a query which gives a list of all (user, friend) pairs. You've required that a friend pair be a reciprocal relationship, so we need to do that. Otherwise the friendships table would offer those pairs.

    Your query is pretty close. What you need for this query is the inner join.

    SELECT f1.user1_id AS user, 
           f1.user2_id AS friend 
      FROM friendships AS f1
      JOIN friendships f2 
        ON  (f1.user1_id = f2.user2_id AND f1.user2_id = f2.user1_id)
    

    Then, you can employ this query as a virtual table. For example, you can do this to get a list of the friends of user 43.

     SELECT friend
       FROM (
            SELECT f1.user1_id AS user, 
                   f1.user2_id AS friend 
              FROM friendships AS f1
              JOIN friendships f2 
                ON  (f1.user1_id = f2.user2_id AND f1.user2_id = f2.user1_id)
            ) AS friends 
      WHERE user = 43 
    

    You might want to set your friends query up as a view, like so.

     CREATE VIEW friends AS (
            SELECT f1.user1_id AS user, 
                   f1.user2_id AS friend 
              FROM friendships AS f1
              JOIN friendships f2 
                ON  (f1.user1_id = f2.user2_id AND f1.user2_id = f2.user1_id)
     )
    

    That way you can abbreviate your complex queries like this.

     SELECT friend FROM friends WHERE user = 43;
    

    Your juicier queries get easy too:

    SELECT thread_id,owner_id,message,time  
      FROM threads AS th
      JOIN (
            SELECT f1.user1_id AS user, 
                   f1.user2_id AS friend 
              FROM friendships AS f1
              JOIN friendships f2 
                ON  (f1.user1_id = f2.user2_id AND f1.user2_id = f2.user1_id)
    
           ) AS f ON th.owner_id = f.friend
     WHERE f.user = 43
    

    If you use the view you can do this. It means the same thing but is easier to read.

    SELECT thread_id,owner_id,message,time  
      FROM threads AS th
      JOIN friends AS f ON th.owner_id = f.friend
     WHERE f.user = 43
    

    (Note: JOIN and INNER JOIN are synonymous.)

    See how that goes? You can encapsulate your friends query (either as a view or just inline as a virtual table) and use its results. The optimizer knows how to do this quickly when you specify such things as WHERE f.user = 43.

    If you have lots of rows in the friendships table you may find that a pair of compound indexes (user1_id, user2_id) and (user2_id, user1_id) help the performance of these queries

    (It's not a recursive query, comments to the contrary notwithstanding.)