Search code examples
sqlperformanceoracle-databaseoracle11gexadata

Impact of COUNT() Based on Column in a Table


Edit:

Database - Oracle 11gR2 - Over Exadata (X2)


Am framing a problem investigation report for past issues, and I'm slightly confused with a situation below.

Say , I have a table MYACCT. There are 138 Columns. It holds 10 Million records. With regular updates of at least 1000 records(Inserts/Updates/Deletes) per hour.

Primary Key is COL1 (VARCHAR2(18)) (Application rarely uses this, except for a join with other tables)

There's Another Unique Index over COL2 VARCHAR2(9)). Which is the one App regularly uses. What ever updates I meant previously happens based on both these columns. Whereas any SELECT only operation over this table, always refer COL2. So COL2 would be our interest.

We do a Query below,

SELECT COUNT(COL2) FROM MYACCT; /* Use the Unique Column (Not PK) */

There's no issue with the result, whereas I was the one recommending to change it as

SELECT COUNT(COL1) FROM MYACCT; /* Use the primary Index

I just calculated the time taken for actual execution

Query using the PRIMARY KEY was faster by `0.8-1.0 seconds always!

Now, I am trying to explain this behavior. Just drafting the Explain plan behind these Queries.

Query 1: (Using Primary Key)

SELECT COUNT(COL1) FROM MYACCT;

Plan :

SQL> select * from TABLE(dbms_xplan.display);
Plan hash value: 2417095184

---------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 | 11337   (1)| 00:02:17 |
|   1 |  SORT AGGREGATE               |         |     1 |            |          |
|   2 |   INDEX STORAGE FAST FULL SCAN| PK_ACCT |    10M| 11337   (1)| 00:02:17 |
---------------------------------------------------------------------------------

9 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41332  consistent gets
          0  physical reads
          0  redo size
        210  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Query 2:(NOT using Primary Key)

SELECT COUNT(COL2) FROM MYACCT;

Plan :

Plan hash value: 1130703739

------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |     1 |    10 |  7868   (1)| 00:01:35 |
|   1 |  SORT AGGREGATE               |          |     1 |    10 |            |          |
|   2 |   INDEX STORAGE FAST FULL SCAN| MYINDX01 |    10M|    95M|  7868   (1)| 00:01:35 |
------------------------------------------------------------------------------------------

9 rows selected.

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      28151  consistent gets
         23  physical reads
        784  redo size
        233  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We can find in terms of Cost and Time the Query without primary key wins. Then how come the execution time of primary key is faster???

EDIT:

SQL> select segment_name, bytes from dba_segments where segment_name in ('MYINDX01','PK_ACCT');
PK_ACCT               343932928
MYINDX01              234881024

Solution

  • You're reading more data from the PK index than you are from the other one. COL1 is VARCHAR2(18) while COL2 is VARCHAR(9), which doesn't necessarily mean anything but implies you probably have values in COL1 that are consistently longer than those in COL2. They will therefore use more storage, both in the table and in the index, and the index scan has to pull more data from the block buffer and/or disk for the PK-based query.

    The execution statistics show that; 41332 consistent gets for the PK-based query, and only 28151 for the faster one, so it's doing more work with the PK. And the segment sizes show it too - for the PK you need to read about 328M, for the UK only 224M.

    The block buffer is likely to be crucial if you're seeing the PK version run faster sometimes. In the example you've shown both queries are hitting the block buffer - the 23 physical reads are a trivial number, If the index data wasn't cached consistently then you might see 41k consistent gets versus 28k physical reads, which would likely reverse the apparent winner as physical reads from disk will be slower. This often manifests if running two queries back to back shows one faster, but reversing the order they run shows the other as faster.

    You can't generalise this to 'PK query is slower then UK query'; this is because of your specific data. You'd probably also get better performance if your PK was actually a number column, rather than a VARCHAR2 column holding numbers, which is never a good idea.