Search code examples
sqlrdbms

Expensive query on database side


Let's consider this two tables:

|     Person    |
----------------
|  ID  | Name   | 
_________________
|  1   |  alice |
|  2   |  bob   |  
|  ... |  ...   |  
| 99   |  zach  |  


| PersonFriend  |
----------------
|  ID  |FriendID| 
_________________
|  1   |  2     |
|  2   |  1     |  
|  2   |  99    |
| ...  |  ...   |  
| 99   |  1     |  

These are two tables to model friends and friends-of-friends. Now the first query is:
1) Who are Bob’s friends?
The query using SQL is:

SELECT p1.Person
FROM Person p1 JOIN PersonFriend  ON PersonFriend.FriendID = p1.ID 
     JOIN Person p2  ON PersonFriend.PersonID = p2.ID 
WHERE p2.Person = 'bob'

Now consider the viceversa:
2) Who is friends with Bob?
The query in SQL is:

SELECT p1.Person 
FROM Person p1 JOIN PersonFriend  ON PersonFriend.PersonID = p1.ID 
     JOIN Person p2  ON PersonFriend.FriendID = p2.ID 
WHERE p2.Person = 'bob' 

Well, these two queries are both easy, but I don't understand why the second one is more expensive on the database side than the first. Can someone help me?


Solution

  • I try to reply on my own. Let's consider the query tree of the two queries:

    1) Who are Bob’s friends?

       PROJ p1.Person
              |
              |
         J   O   I   N 
     p1.ID=PersonFriend.FriendID 
         /            \
        /              \
       /                \
    Person p1    ###### J   O   I   N ######
                  p2.ID=PersonFriend.PersonID 
                          /         \
                         /           \ 
                        /             \ 
                    SEL(Persona p2)   PersonFriend
               p2.Person="bob"
    

    2) Who is friends with Bob?

         PROJ p1.Person
                  |
                  |
             J   O   I   N 
         p1.ID=PersonFriend.PersonID 
             /            \
            /              \
           /                \
        Person p1    ###### J   O   I   N ######
                      p2.ID=PersonFriend.FriendID  
                              /         \
                             /           \ 
                            /             \ 
                        SEL(Persona p2)   PersonFriend
                   p2.Person="bob"
    

    The key of PersonFriend table is compound: , and records are sorted by ID. As you can see the two joins marked by hashes have differents arguments. The first one does not need to iterate all records of PersonFriend table, because records are sorted by ID. The second join, needs to iterate all records of PersonFriend table.
    I think this is the reason why the second query is more expensive than the first.