Search code examples
sqloracle-databaseaveragewindow-functions

Analytic functions and means of window clause


I'm using Oracle and SQL Developer. I have downloaded HR schema and need to do some queries with it. Now I'm working with table Employees. As an user I need to see employees with the highest gap between their salary and the average salary of all later hired colleagues in corresponding department. It seems quite interesting and really complicated. I have read some documentation and tried, for example LEAD(), that provides access to more than one row of a table at the same time:

SELECT
    employee_id,
    first_name
    || ' '
    || last_name,
    department_id,
    salary,
    hire_date,
    LEAD(hire_date)
    OVER(PARTITION BY department_id
         ORDER BY
             hire_date DESC
    ) AS Prev_hiredate
FROM
    employees
ORDER BY
    department_id,
    hire_date;

That shows for every person in department hiredate of later hired person. Also I have tried to use window clause to understand its concepts:

SELECT
    employee_id,
    first_name
    || ' '
    || last_name,
    department_id,
    hire_date,
    salary,
    AVG(salary)
    OVER(PARTITION BY department_id
         ORDER BY
             hire_date ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
    ) AS avg_sal
FROM
    employees
ORDER BY
    department_id,
    hire_date;

The result of this query will be: Result of query for 20 department

Result of query for 60 department

However, it is not exactly what I need. I need to reduce the result just by adding column with gap (salary-avr_sal), where the gap will be highest and receive one employee per department. How should the result look like: for example, we have 60 department. We have 5 employees there ordering by hire_date. First has salary 4800, second – 9000, third – 4800, fourth – 4200, fifth – 6000. If we do calculations: 4800 - ((9000+4800+4200+6000)/4)=-1200, 9000-((4800+4200+6000)/3)=4000, 4800 -((4200+6000)/2)=-300, 4200 - 6000=-1800 and the last person in department will have the highest gap: 6000 - 0 = 6000. Let's take a look on 20 department. We have two people there: first has salary 13000, second – 6000. Calculations: 13000 - 6000 = 7000, 6000 - 0 = 6000. The highest gap will be for first person. So for department 20 the result should be person with salary 13000, for department 60 the result should be person with salary 6000 and so on. How should look my query to get the appropriate result (what I need is marked bold up, also I want to see column with highest gap, can be different solutions with analytic functions, but should be necessarily included window clause)?


Solution

  • Maybe this is what you are looking for.
    Sample data:

    WITH
        emp (ID, EMP_NAME, HIRE_DATE, SALARY, DEPT) AS 
            (
                Select 601, 'HILLER',   To_Date('23-JAN-82', 'dd-MON-yy'),  4800,  60 From Dual Union All
                Select 602, 'MILLER',   To_Date('23-FEB-82', 'dd-MON-yy'),  9000,  60 From Dual Union All
                Select 603, 'SMITH',    To_Date('23-MAR-82', 'dd-MON-yy'),  4800,  60 From Dual Union All
                Select 604, 'FORD',     To_Date('23-APR-82', 'dd-MON-yy'),  4200,  60 From Dual Union All
                Select 605, 'KING',   To_Date('23-MAY-82', 'dd-MON-yy'),    6000,  60 From Dual Union All
                Select 201, 'SCOT',   To_Date('23-MAR-82', 'dd-MON-yy'), 13000,  20 From Dual Union All
                Select 202, 'JONES',    To_Date('23-AUG-82', 'dd-MON-yy'),  6000,  20 From Dual 
            ),
    

    Create CTE named grid with several analytic functions and windowing clauses. They are not all needed but the resulting dataset below shows the logic with all components included.

        grid AS
            (
              Select 
                  g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
              From
                  (
                      Select  
                          ROWNUM "RN",
                          Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN Unbounded Preceding And Current Row) "RN_DEPT",
                          ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
                          --
                          Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "SUM_SAL_LATER",
                          Nvl(Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "COUNT_EMP_LATER",
                          --
                          Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
                          Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "AVG_LATER",
                          --
                          SALARY - 
                          Nvl((
                          Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
                          Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
                          ), 0)  "GAP"
                      from 
                          emp
                      Order By
                          DEPT, HIRE_DATE, ID
                  ) g
              Order By
                  RN
            )
    

    CTE grid resultiing dataset:

    RN RN_DEPT ID EMP_NAME HIRE_DATE DEPT SALARY SUM_SAL_LATER COUNT_EMP_LATER AVG_LATER GAP DEPT_MAX_GAP
    1 1 601 HILLER 23-JAN-82 60 4800 24000 4 6000 -1200 6000
    2 2 602 MILLER 23-FEB-82 60 9000 15000 3 5000 4000 6000
    3 3 603 SMITH 23-MAR-82 60 4800 10200 2 5100 -300 6000
    4 4 604 FORD 23-APR-82 60 4200 6000 1 6000 -1800 6000
    5 5 605 KING 23-MAY-82 60 6000 0 0 0 6000 6000
    6 1 201 SCOT 23-MAR-82 20 13000 6000 1 6000 7000 7000
    7 2 202 JONES 23-AUG-82 20 6000 0 0 0 6000 7000

    Main SQL

    SELECT
        g.ID, g.EMP_NAME, g.HIRE_DATE, g.DEPT, g.SALARY, g.GAP
    FROM
        grid g
    WHERE
        g.GAP = g.DEPT_MAX_GAP
    Order By 
        RN
    

    Resulting as:

    ID EMP_NAME HIRE_DATE DEPT SALARY GAP
    605 KING 23-MAY-82 60 6000 6000
    201 SCOT 23-MAR-82 20 13000 7000

    Without CTE and with all unnecessery columns excluded it looks like this:

    SELECT    ID, EMP_NAME, HIRE_DATE, DEPT, SALARY, GAP
    FROM      
        (
            ( Select g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
              From(   Select  
                          ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
                          SALARY - 
                          Nvl(( Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
                                Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
                              ), 0)  "GAP"
                      From emp
                      Order By DEPT, HIRE_DATE, ID
                  ) g
            )    
        )
    WHERE     GAP = DEPT_MAX_GAP
    Order By  DEPT, HIRE_DATE, ID
    

    It seems like this is all you need.

    Regards...