Search code examples
sqlpostgresqlpostgresql-performance

Query with UNION or OR?


I have a table friend used to store the relationship between two user.

For example: (1,2) means user1 and user2 are friends. (2,1) means the same, but there we won't store that, making uid1 < uid2 manually:

CREATE TABLE public.friend (
  uid1 INTEGER,
  uid2 INTEGER
);
CREATE INDEX index_uid1 ON friend USING BTREE (uid1);
CREATE INDEX index_uid2 ON friend USING BTREE (uid2);

To find the friends of uid=2, I can use:

Sql1:

select * from friend where uid1=2
union
select * from friend where uid2=2;

Sql2:

select * from friend uid1=2 or uid2=2

What I get is that sql2 is better than sql1 in performance.

But sql1 is recommended. Is that correct?


Solution

  • I have a table friend used to store the relationship between two user.

    For example: (1,2) means user1 and user2 are friends. (2,1) means the same, but there we won't store that, making uid1 < uid2 manually:

    Typically, you would implement that with have a PRIMARY KEY on (udi1, uid2) and a CHECK constraint enforcing uid1 < uid2.

    CREATE TABLE public.friend (
       uid1 integer
     , uid2 integer
     , PRIMARY KEY (uid1, uid2)
     , CONSTRAINT uid2_gt_uid1 CHECK (uid1 < uid2)
    );
    
    CREATE INDEX index_uid2 ON friend USING BTREE (uid2);
    

    You don't need the other index, it's covered by the index of the PK;

    CREATE INDEX index_uid1 ON friend USING BTREE (uid1);

    Then there cannot be duplicates (including switched duplicates) and nobody can be friend with him/her self either, and your query can simply be:

    SELECT * FROM friend WHERE 2 IN (uid1, uid2);
    

    ... which is short for:

    SELECT * FROM friend WHERE uid1 = 2 OR uid2 = 2;
    

    And the UNION variant is now logically identical:

    SELECT * FROM friend WHERE uid1=2
    UNION
    SELECT * FROM friend WHERE uid2=2;
    

    But you would UNION ALL instead of just UNION as there are no duplicates to begin with and UNION ALL is cheaper. But still slightly more expensive than the single SELECT above.

    Duplicates?

    There are three possible sources of duplicates in the UNION ALL query:

    1. Duplicate rows in the underlying table (ruled out by the PK).
    2. Rows fetched multiple times by more than one SELECT branch.
    3. In your particular case: logical duplicates with switched IDs (ruled out by the CHECK constraint).

    Once you understand this, you also understand the implications of the different query techniques. With the suggest setup, only 2. remains as possible issue.