Search code examples
postgresqlindexingsql-execution-planunique-index

Is any performance enhancement when we used Unique index instead of non Unique index?


I know a unique index will be faster than a non unique index theoretically if data is unique.

Because a unique index is able to provide more information and let the query optimizer choose more effective execution plan.

I am doing some testing and want to prove that a unique index could be better than a non unique index from the execution plan, but the result show me those are the same...

CREATE TABLE T3(
    ID INT NOT NULL,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);

CREATE INDEX IX_ID_T3 ON T3 (ID);
CREATE UNIQUE INDEX UIX_ID_T3 ON T3 (ID);

INSERT INTO T3
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,1000000) i;

vacuum ANALYZE T3;

I have created a table and two indexes (IX_ID_T3 is non unique,UIX_ID_T3 is unique) then inserted 1000000 sample rows.

After I inserted data I ran vacuum ANALYZE T3;

--drop index IX_ID_T3 

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT DISTINCT a1.ID
FROM T3 a1 INNER JOIN T3 a2
ON a1.id = a2.id
WHERE a1.id <= 300000

First query, I had tried to test between UIX_ID_T3 and IX_ID_T3 by Merge-Join

The Buffers: shared hit and execution plan are no different.

Here is my execution plan

-- UIX_ID_T3 
"Unique  (cost=0.85..41457.94 rows=298372 width=4) (actual time=0.030..267.207 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40712.01 rows=298372 width=4) (actual time=0.030..200.412 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using uix_id_t3 on t3 a1  (cost=0.42..8501.93 rows=298372 width=4) (actual time=0.017..49.237 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using uix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.010..40.170 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.171 ms"
"Execution Time: 282.919 ms"

---IX_ID_T3 
"Unique  (cost=0.85..41420.43 rows=297587 width=4) (actual time=0.027..230.256 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40676.46 rows=297587 width=4) (actual time=0.027..173.308 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using ix_id_t3 on t3 a1  (cost=0.42..8476.20 rows=297587 width=4) (actual time=0.015..41.606 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using ix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.009..34.019 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.195 ms"
"Execution Time: 243.711 ms"

There was another question are-unique-indexes-better-for-column-search-performance-pgsql-mysql to discuss with this topic.

I had also tried to test the answer of question query, but the execution plan are no different.

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT  id
FROM    T3
ORDER BY
        id
LIMIT 10;
-- using IX_ID_T3 
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.034..0.036 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using uix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.033..0.034 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.052 ms"
"Execution Time: 0.047 ms"

-- using IX_ID_T3
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.026..0.029 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using ix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.025..0.027 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.075 ms"
"Execution Time: 0.043 ms"

I saw a lot of different articles but I can't prove that a unique index could be better than a non unique index through the execution plan.

Postgres unique constraint vs index

Question:

Can anyone prove a unique index could be better than a non unique index from an execution plan and show us the queries and executions plan?

To my knowledge from unique index of sql-server not only be a constraint but also can be better performance than non unique index.

The Many Mysteries of Merge Joins


Solution

  • An unique index won't be any faster to scan than a non-unique one. The only potential benefit in query execution speed could be that the optimizer can make certain deductions from the uniqueness and for example remove an unnecessary join.

    The primary use of unique indexes is to implement table constraints, not to provide a performance advantage over non-unique indexes.

    Here is an example:

    CREATE TABLE parent (pid bigint PRIMARY KEY);
    
    CREATE TABLE child (
       cid bigint PRIMARY KEY,
       pid bigint UNIQUE REFERENCES parent
    );
    
    EXPLAIN (COSTS OFF)
    SELECT parent.pid FROM parent LEFT JOIN child USING (pid);
    
         QUERY PLAN     
    ════════════════════
     Seq Scan on parent
    (1 row)
    

    Without the unique constraint on child.pid (which is implemented by a unique index) the join could not be removed.