Search code examples
oracle-databaseperformanceoptimizationindexing

How to optimize SQL query and create index


I need to generate statistics for different types of disabilities and count how many women, who started suffering from each condition in 2018, were affected during each week of January (first month of the year). The conditions I need to track are:

  • Mental disorders
  • Cognitive disorders
  • Social disorders

The data is stored in a table, where I need to:

Retrieve the data for each condition. Break the data down by week (1st, 2nd, 3rd, and 4th week) for the first month of 2018. Filter the data for women, based on the fact that the second and third digits of the birth number are greater than or equal to 51.

SELECT 
    disability_name,
    SUM(CASE WHEN TO_CHAR(start_date, 'IW') = '01' THEN 1 ELSE 0 END) AS first_week,
    SUM(CASE WHEN TO_CHAR(start_date, 'IW') = '02' THEN 1 ELSE 0 END) AS second_week,
    SUM(CASE WHEN TO_CHAR(start_date, 'IW') = '03' THEN 1 ELSE 0 END) AS third_week,
    SUM(CASE WHEN TO_CHAR(start_date, 'IW') = '04' THEN 1 ELSE 0 END) AS fourth_week
FROM p_disabilities
JOIN p_disability_type
    ON p_disabilities.id_disability = p_disability_type.disability_id
WHERE TO_CHAR(start_date, 'YYYY') = '2018'
  AND TO_CHAR(start_date, 'MM') = '01'
  AND TO_NUMBER(SUBSTR(birth_number, 3, 2)) >= 51
GROUP BY disability_name;

I am trying to simplify this SQL query to make it more efficient. Specifically, I want to improve the performance of the query since it's working with potentially large datasets. What are some ways I can simplify or optimize this query?

Additionally, what would be the most suitable index to use on the tables involved, particularly on start_date, birth_number, and disability_id, to ensure this query runs as quickly as possible?

Script for tables :

CREATE TABLE p_disabilities ( 
id_disability CHAR(6) NOT NULL PRIMARY KEY, 
birth_number CHAR(11) NOT NULL, 
start_date DATE NOT NULL, 
end_date DATE NOT NULL, 
disability_id NUMBER NOT NULL ); 

CREATE TABLE p_disability_type ( 
disability_id NUMBER NOT NULL, 
disability_name VARCHAR2(50) );

Solution

  • You can give a try to:

    CREATE INDEX ix_comp_datyearmonthbirth ON p_disabilities(
    EXTRACT(year FROM start_date), EXTRACT(month FROM start_date), TO_NUMBER(SUBSTR(birth_number, 2, 2) DEFAULT 0 ON CONVERSION ERROR)) ;
    
    SELECT DBMS_STATS.CREATE_EXTENDED_STATS(user, 'p_disabilities') FROM DUAL;
    EXEC DBMS_STATS.GATHER_TABLE_STATS(user,'p_disabilities');
    

    and the WHERE clause changed to

    WHERE EXTRACT(year FROM d.start_date) = 2018 
      AND EXTRACT(month FROM d.start_date) = 1
      AND TO_NUMBER(SUBSTR(d.birth_number, 2, 2) DEFAULT 0 ON CONVERSION ERROR) >= 51
    

    Then compare the plans with the INDEX including "disability_name" as additional last column to see if an impact on the group/order by. (you can create both indexes and play with their visibility for the testing)