How to improve performance in Oracle using SELECT DISTINCT

I'm currently working in the deployment of an OFBiz based ERP The database being used is Oracle 10g Enterprise

One of the biggest issues is some oracle performance problems, analyzing the ofbiz logs, the following query:

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY, 
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID, 
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID, 
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE, 
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP, 
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM 
ERP.ORDER_HEADER WHERE ((STATUS_ID = :v0 OR STATUS_ID = :v1 OR STATUS_ID = :v2) AND 
(ORDER_TYPE_ID = :v3)) ORDER BY ORDER_DATE DESC

is very slow. We've tested executing the query without the DISTINCT and it takes about 30 seconds. There are 4.000.000+ registers in the table. There are index for the PK field orderId and almost every other field

The EXPLAIN PLAN with DISTINCT is:

SELECT STATEMENT () (null)
 SORT (ORDER BY)    (null)
  HASH (UNIQUE) (null)
   TABLE ACCESS (FULL)  ORDER_HEADER

and without the DISTINCT is:

SELECT STATEMENT () (null)
 SORT (ORDER BY)    (null)
  TABLE ACCESS (FULL)   ORDER_HEADER

any ideas about tuning oracle to improve the performance of this kind of queries? It's very difficult to rewrite the query because is automatically generated by ofbiz so I think the solution is about tuning oracle

thanks in advance

EDIT: I analyzed the query using tkprof ,as suggested by Rob van Wijk and haffax,and the result is the following

********************************************************************************

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY, 
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID, 
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID, 
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE, 
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP, 
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM 
ERP.ORDER_HEADER WHERE STATUS_ID = 'ORDER_COMPLETED' ORDER BY ORDER_DATE DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      9.10     160.81      66729      65203         37          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      9.14     160.83      66729      65203         37          50

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58  

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file scattered read                       8178        0.28        146.55
  direct path write temp                       2200        0.04          4.22
  direct path read temp                          36        0.14          2.01
  SQL*Net more data to client                     3        0.00          0.00
  SQL*Net message from client                     1        3.36          3.36
********************************************************************************

So it seems the problem is the 'db file scattered read', any ideas to how to tune oracle in order to reduce the wait in this event?

Follow up with the new tkprof result, this time closing the session:

********************************************************************************

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY,
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID,
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID,
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE,
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP,
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM
ERP.ORDER_HEADER WHERE STATUS_ID = 'ORDER_COMPLETED' ORDER BY ORDER_DATE DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        1      8.23      47.66      66576      65203         31          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      8.26      47.68      66576      65203         31          50

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58 

Rows     Row Source Operation
-------  ---------------------------------------------------
     50  SORT ORDER BY (cr=65203 pr=66576 pw=75025 time=47666679 us)
3456659   TABLE ACCESS FULL ORDER_HEADER (cr=65203 pr=65188 pw=0 time=20757300 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file scattered read                       8179        0.14         34.96
  direct path write temp                       2230        0.00          3.91
  direct path read temp                          52        0.14          0.84
  SQL*Net more data to client                     3        0.00          0.00
  SQL*Net message from client                     1     1510.62       1510.62
********************************************************************************

Solution

If the difference between the two queries is substantial, that would be surprising. You mention that the query without DISTINCT takes about 30 seconds. How much time does the query with the DISTINCT take?

Can you show the tkprof output of the query with the DISTINCT, after you traced the session with a "alter session set events '10046 trace name context forever, level 8'", and disconnect after the query has finished? This way we can see where time is actually being spent and if it was waiting for something ("direct path read temp" maybe?)

Regards, Rob.

Followup, after the tkprof file was posted:

I see you managed to get the tkprof output, but unfortunately you didn't disconnect your session before creating the tkprof file. Now, the cursor was left open and it failed to write STAT# lines to your trace file. This is why you don't have a plan / row source operation in your tkprof file. It would be nice if you can repeat the process, if the suggestion below turns out to be rubbish.

A little speculation from my side: I think the DISTINCT is almost a no-op because you are selecting so many columns. If this is true, then your predicate "WHERE STATUS_ID = 'ORDER_COMPLETED'" is very selective and you will benefit from having an index on this column. After you create the index, make sure you analyze it properly, maybe even with a histogram on if data values are skewed. The end result will be a different plan for this query, starting with an INDEX RANGE SCAN, followed by a TABLE ACCESS BY ROWID, leading to a very fast query.

After you have created an index, you can have the table re-analyzed, including histograms using this statement:

exec dbms_stats.gather_table_stats([owner],[table_name],cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ')

Regards, Rob.