Search code examples
sqlplsqlsum

PL/SQL: How to sum a set of values if they fall within a specific time frame?


I have a query (below) that shows the number of terminations since 1/1/17 in one column and the associated date of the terminations in the only other column. If there were no terminations on a specific date, then there is no record for that date.

I want to create rolling 12-month time buckets and sum the number of terminations in those time buckets.

For example, the most recent time bucket would have an ending date of 11:59pm on 6/30/22. The start of that time bucket would start midnight on 7/1/21. I want to sum the number of terminations in that time bucket.

I need to create 12-month time buckets and the associated number of terminations for the last 60 months, resulting in 60 time buckets.

Here is my current query:

select 
    count(distinct employee_number) Number_of_terminations
    , to_char(term_date, 'MM/DD/YYYY') term_date
from 
        (
                select paa.person_id
                  ,max(paa.effective_end_date)+1 term_date
                  ,pap.employee_number
                                        
                from 
                apps.per_all_assignments_f paa
                , apps.per_assignment_status_types past
                    ,(select distinct paa.person_id
                        from 
                        apps.per_all_assignments_f paa
                        , apps.per_assignment_status_types past
                        where paa.assignment_status_type_id = past.assignment_status_type_id
                        and sysdate between paa.effective_start_date and paa.effective_end_date
                        and past.user_status in ('Active Assignment','Transitional - Active','Transitional - Inactive','Sabbatical','Sabbatical 50%')) active_person
                , apps.per_all_people_f pap    
                , apps.hr_organization_units org
                
                  ,(select case when orgp.name = 'Random University' then orgc.attribute1 else orgp.attribute1 end unit_number
                      ,case when orgp.name = 'Random State University' then orgc.name else orgp.name end unit_name
                      ,orgc.attribute1 dept_number
                      ,orgc.name dept_name
                    from apps.per_org_structure_elements_v2 pose
                      ,apps.per_org_structure_versions posv
                      ,apps.hr_all_organization_units orgp
                      ,apps.hr_all_organization_units orgc
                    where pose.org_structure_version_id = posv.org_structure_version_id
                    and pose.organization_id_parent = orgp.organization_id
                    and pose.organization_id_child = orgc.organization_id
                    and trunc(sysdate) between posv.date_from and nvl(posv.date_to,'31-dec-4712')
                    and pose.org_structure_hierarchy = 'Units'
                    order by case when orgp.name = 'Colorado State University' then orgc.attribute1 else orgp.attribute1 end
                      ,orgc.attribute1) u
                              
                 , apps.per_jobs pj
                , apps.per_job_definitions pjd
                
                where paa.assignment_status_type_id = past.assignment_status_type_id
                and paa.person_id = active_person.person_id(+)
                and active_person.person_id is null
                and past.user_status in ('Active Assignment','Transitional - Active','Transitional - Inactive','Sabbatical','Sabbatical 50%')
                and pap.person_id = paa.person_id
                and paa.organization_id = org.organization_id
                and org.attribute1 = u.dept_number(+)
                and paa.job_id = pj.job_id
                and pj.job_definition_id = pjd.job_definition_id
                and pap.employee_number is not null
                and (
                    paa.effective_end_date like '%17' or
                    paa.effective_end_date like '%18' or
                    paa.effective_end_date like '%19' or
                    paa.effective_end_date like '%20' or
                    paa.effective_end_date like '%21' or
                    paa.effective_end_date like '%22' 
                    )
             
                group by paa.person_id
                , pap.employee_number


        ) terms
        --group by substr(term_date, 4, 6)
        group by to_char(term_date, 'MM/DD/YYYY')

Here are the first rows of the results: enter image description here

In Excel the first sum would like be calculated like this: Excel example


Solution

  • I don't have your data and I don't want to spend time generating some test data to match that monster query but here is a simplified example explaining how to do this:

    • Create a calendar table: 1 record per bucket (monthly) with start and end date.
    CREATE TABLE last_60_months (start_dt, end_dt) 
    AS
    (SELECT TRUNC(ADD_MONTHS(SYSDATE,-LEVEL+1), 'MON'), TRUNC(ADD_MONTHS(SYSDATE,-LEVEL+13), 'MON') - 1 FROM DUAL 
     CONNECT BY LEVEL < 61
    );
    
    • Create a test table with 10000 employees and a termination date within the test buckets boundaries:
    CREATE table test_emps (employee_number NUMBER, term_date DATE);
    
    DECLARE
      l_dt DATE;
      l_min_dt  DATE;
      l_max_dt  DATE;
    BEGIN
      SELECT MIN(start_dt), MAX(start_dt) INTO l_min_dt, l_max_dt FROM last_60_months;
      FOR r IN 1 .. 10000 LOOP
        SELECT TO_DATE(
                     TRUNC(
                          DBMS_RANDOM.VALUE(TO_CHAR(l_min_dt,'J')
                                           ,TO_CHAR(l_max_dt,'J')
                                           )
                           ),'J'
                      ) 
         INTO l_dt
         FROM DUAL;  
        INSERT INTO test_emps (employee_number, term_date) VALUES (r, l_dt );
      END LOOP;
      COMMIT;
    END;
    /
    
    

    Put it all together:

    SELECT COUNT(e.employee_number) as "Number_of_terminations", d.start_dt, d.end_dt
      FROM test_emps e JOIN last_60_months d ON e.term_date BETWEEN d.start_dt AND d.end_dt 
     GROUP BY start_dt, end_dt
     ORDER BY start_dt;
    

    It should be trivial to use this technique for your own data.