Search code examples
oracle12cmultiple-tableshierarchical-query

Oracle hierarchical query joining 2 tables


I have 2 tables: projects and tasks (that's a simplification, but will do for now). PROJECTS has a hierarchy of projects where there can be a detached (standalone) project or a projects hierarchy. TASKS contains a list of tasks. Tasks are hierarchical in nature and are linked to projects. In case of the projects hierarchy, tasks in the top level are kind of templates and can be redefined in the lower level projects (in my case, i have a hierarchy of 2 projects). That is, i want:

  1. Determine the top-level project starting with the lowest level execution project
  2. Starting from the top-level project, display hierarchy of tasks that will show the lowest project level for identical task IDs.

Sample table structure:

create table projects
( project_id varchar(14)
, parent_project_id varchar(14));

create table tasks
( project_id varchar(14)
, task_id varchar(6)
, parent_task_id varchar(6)
, inactive char(1)
, redefined char(1)
);

Sample data: PROJECTS:

PROJECT_ID PARENT_PROJECT_ID
MASTER_PROJECT ' '
EXECUTION_PROJ MASTER_PROJECT
UNRELATED_PROJ ' '

TASKS:

PROJECT_ID TASK_ID PARENT_TASK_ID INACTIVE REDEFINED
EXECUTION_PROJ 101000 100000 X X
EXECUTION_PROJ 101010 101000 X
EXECUTION_PROJ 101011 101010 X
EXECUTION_PROJ 101012 101010
EXECUTION_PROJ 102000 100000 X
EXECUTION_PROJ 102010 102000
EXECUTION_PROJ 102011 102010
EXECUTION_PROJ 210000 200000
EXECUTION_PROJ 300000 000000 X
EXECUTION_PROJ 310000 000000 X X
MASTER_PROJECT 100000 000000
MASTER_PROJECT 101000 100000
MASTER_PROJECT 102000 100000 X
MASTER_PROJECT 200000 000000
MASTER_PROJECT 300000 000000
MASTER_PROJECT 310000 300000

I can determine the top-level project easily (or i can hardcode it, doesn't matter); and i can show hierarchy of tasks, that also works fine (sorry, working in the ABAP environment, can't create Oracle tables; trying to emulate this with select from dual):

with projects as (
select 'MASTER_PROJECT' project_id, ' ' parent_project_id from dual
union all
select 'EXECUTION_PROJ', 'MASTER_PROJECT' from dual
union all
select 'UNRELATED_PROJ', ' ' from dual),

toplvl_proj as (
select project_id from projects
where connect_by_isleaf = 1
connect by prior parent_project_id = project_id
start with project_id = 'EXECUTION_PROJ'),

tasks(PROJECT_ID, TASK_ID, PARENT_TASK_ID, INACTIVE, REDEFINED) as (
select 'EXECUTION_PROJ', '101000', '100000', 'X', 'X' from dual union all
select 'EXECUTION_PROJ', '101010', '101000', 'X', ' ' from dual union all
select 'EXECUTION_PROJ', '101011', '101010', 'X', ' ' from dual union all
select 'EXECUTION_PROJ', '101012', '101010', ' ', ' ' from dual union all
select 'EXECUTION_PROJ', '102000', '100000', ' ', 'X' from dual union all
select 'EXECUTION_PROJ', '102010', '102000', ' ', ' ' from dual union all
select 'EXECUTION_PROJ', '102011', '102010', ' ', ' ' from dual union all
select 'EXECUTION_PROJ', '210000', '200000', ' ', ' ' from dual union all
select 'EXECUTION_PROJ', '300000', '000000', ' ', 'X' from dual union all
select 'EXECUTION_PROJ', '310000', '000000', 'X', 'X' from dual union all
select 'MASTER_PROJECT', '100000', '000000', ' ', ' ' from dual union all
select 'MASTER_PROJECT', '101000', '100000', ' ', ' ' from dual union all
select 'MASTER_PROJECT', '102000', '100000', 'X', ' ' from dual union all
select 'MASTER_PROJECT', '200000', '000000', ' ', ' ' from dual union all
select 'MASTER_PROJECT', '300000', '000000', ' ', ' ' from dual union all
select 'MASTER_PROJECT', '310000', '300000', ' ', ' ' from dual)

select t.*, p.parent_project_id, level tsklvl
from tasks t
join projects p on p.project_id = t.project_id
connect by prior t.task_id = t.parent_task_id
start with t.project_id in (select project_id from toplvl_proj)
and t.parent_task_id = '000000'

SQL Fiddle

What i don't understand is how to leave only the lowest level (from the task_id perspective) - that is, if a task exists only in the parent project or only in the execution project, select it; if it exists in both projects, select only the record belonging to the execution project.

The above query produces this result:

PROJECT_ID TASK_ID PARENT_TASK_ID INACTIVE REDEFINED PARENT_PROJECT_ID TSKLVL
MASTER_PROJECT 100000 000000 1
EXECUTION_PROJ 101000 100000 X X MASTER_PROJECT 2
EXECUTION_PROJ 101010 101000 X MASTER_PROJECT 3
EXECUTION_PROJ 101011 101010 X MASTER_PROJECT 4
EXECUTION_PROJ 101012 101010 MASTER_PROJECT 4
MASTER_PROJECT 101000 100000 2
EXECUTION_PROJ 101010 101000 X MASTER_PROJECT 3
EXECUTION_PROJ 101011 101010 X MASTER_PROJECT 4
EXECUTION_PROJ 101012 101010 MASTER_PROJECT 4
EXECUTION_PROJ 102000 100000 X MASTER_PROJECT 2
EXECUTION_PROJ 102010 102000 MASTER_PROJECT 3
EXECUTION_PROJ 102011 102010 MASTER_PROJECT 4
MASTER_PROJECT 102000 100000 X 2
EXECUTION_PROJ 102010 102000 MASTER_PROJECT 3
EXECUTION_PROJ 102011 102010 MASTER_PROJECT 4
MASTER_PROJECT 200000 000000 1
EXECUTION_PROJ 210000 200000 MASTER_PROJECT 2
MASTER_PROJECT 300000 000000 1
MASTER_PROJECT 310000 300000 2

What i need is this (duplicate blocks removed, tasks starting with 3* show EXECUTION_PROJ instead of MASTER_PROJECT and respective values in REDEFINED and INACTIVE fields):

PROJECT_ID TASK_ID PARENT_TASK_ID INACTIVE REDEFINED PARENT_PROJECT_ID TSKLVL
MASTER_PROJECT 100000 000000 1
EXECUTION_PROJ 101000 100000 X X MASTER_PROJECT 2
EXECUTION_PROJ 101010 101000 X MASTER_PROJECT 3
EXECUTION_PROJ 101011 101010 X MASTER_PROJECT 4
EXECUTION_PROJ 101012 101010 MASTER_PROJECT 4
EXECUTION_PROJ 102000 100000 X MASTER_PROJECT 2
EXECUTION_PROJ 102010 102000 MASTER_PROJECT 3
EXECUTION_PROJ 102011 102010 MASTER_PROJECT 4
MASTER_PROJECT 200000 000000 1
EXECUTION_PROJ 210000 200000 MASTER_PROJECT 2
EXECUTION_PROJ 300000 000000 1
EXECUTION_PROJ 310000 300000 2

Can this be achieved?


Solution

  • Silly me. ROW_NUMBER() allows to get rid of unnecessary duplicates, and then hierarchical query doesn't need to worry about cross-project connections:

    with projects as (
    select 'MASTER_PROJECT' project_id, ' ' parent_project_id from dual
    union all
    select 'EXECUTION_PROJ', 'MASTER_PROJECT' from dual
    union all
    select 'UNRELATED_PROJ', ' ' from dual),
    
    proj_hier as (
    select project_id, level proj_level from projects
    connect by parent_project_id = prior project_id 
    start with project_id = 'MASTER_PROJECT'),
    
    tasks(PROJECT_ID, TASK_ID, PARENT_TASK_ID, INACTIVE, REDEFINED) as (
    select 'EXECUTION_PROJ', '101000', '100000', 'X', 'X' from dual union all
    select 'EXECUTION_PROJ', '101010', '101000', 'X', ' ' from dual union all
    select 'EXECUTION_PROJ', '101011', '101010', 'X', ' ' from dual union all
    select 'EXECUTION_PROJ', '101012', '101010', ' ', ' ' from dual union all
    select 'EXECUTION_PROJ', '102000', '100000', ' ', 'X' from dual union all
    select 'EXECUTION_PROJ', '102010', '102000', ' ', ' ' from dual union all
    select 'EXECUTION_PROJ', '102011', '102010', ' ', ' ' from dual union all
    select 'EXECUTION_PROJ', '210000', '200000', ' ', ' ' from dual union all
    select 'EXECUTION_PROJ', '300000', '000000', ' ', 'X' from dual union all
    select 'EXECUTION_PROJ', '310000', '000000', 'X', 'X' from dual union all
    select 'MASTER_PROJECT', '100000', '000000', ' ', ' ' from dual union all
    select 'MASTER_PROJECT', '101000', '100000', ' ', ' ' from dual union all
    select 'MASTER_PROJECT', '102000', '100000', 'X', ' ' from dual union all
    select 'MASTER_PROJECT', '200000', '000000', ' ', ' ' from dual union all
    select 'MASTER_PROJECT', '300000', '000000', ' ', ' ' from dual union all
    select 'MASTER_PROJECT', '310000', '300000', ' ', ' ' from dual),
    
    proj_tasks as (
    select t.*, ph.proj_level,
           row_number() over(partition by t.task_id order by ph.proj_level desc) rn
    from tasks t
    join proj_hier ph on t.project_id = ph.project_id)
    
    
    select t.*, level task_level from proj_tasks t
    where rn = 1
    connect by prior task_id = parent_task_id
    start with parent_task_id = '000000'
    

    Sorry... it took me the whole day and a post to StackOverflow to realise what the solution should look like...