Search code examples
sqloraclequery-optimization

Looking for performance improvements in the SQL


Scenario: There are 2 columns in the table with data as given in the sample below. It is possible that the table has multiple rows for the same value of 'a' column.

In the example, Considering the 'a' column, There are three rows for '1' and one row for '2'.

Sample table 't1':

|a|b  |
|1|1.1|
|1|1.2|
|1|2.2|
|2|3.1|

Requirement is to get following output:

Expected Query output:

|a|b  |
|1|1.2|
|2|3.1|

Requirement:

  • Get the row if there is only one row for a given value for column 'a'.
  • If there are multiple rows for the same value for column 'a' and for all rows, FLOOR(b) == a, then get MIN(a) and MAX(b)
  • If there are multiple rows for column 'a' and for all rows, there is 1 row of column 'b' for which FLOOR(b) > a, then ignore that row. from the remaining rows, get MIN(a) and MAX(b)

Query I used:

select distinct min(a) over(partition by table1.a) as a,
min(b) over(partition by table1.a) as b
from (
    SELECT  distinct Min(table2.a) OVER (PARTITION BY table2.a) AS a, 
    Max(table2.b) OVER (PARTITION BY table2.a) AS b
    FROM t1 table2
  union
    SELECT  distinct Min(table3.a) OVER (PARTITION BY table3.a) AS a, 
    Max(table3.b) OVER (PARTITION BY table3.a) AS b
    FROM t1 table3 
    where table3.a = FLOOR(table3.b)
) table1;

This query is working and I am getting the desired output. Looking for inputs to improve by removing union and extra select from the above script.

Note: t1 is not a table but it's a procedure call in my case and there are additional columns that it returns. It would help if the extra call to the procedure can be avoided.


Solution

  • This is how I would get the data you need.

    select t1.a, max(t1.b) 
      from (select a, b, count(1) over(partition by t1.a) cnt from t1) t1
     where t1.a = floor(t1.b) or cnt = 1
     group by t1.a ,cnt;
    

    It has only one procedure call so it might run significantly faster

    And please note that "union" clause not only appends two data sets, but removes duplicates as well. Removing duplicates causes additional checks between data sets and therefore is leading to performance issues.

    It is in most cases better to use "union all" which doesn't check for duplicates