I am working on the tree hierarchy of supervisors and their supervised employees. The difficulty is that some supervisors are employees supervised by other supervisors, and there are lots of it.
For SQL queries I acquired from class, only about simple self-joins, which might be only like two levels: A is supervised by B, and that's it.
But the issue from the real world is far more complicated. There are multiple levels, and I am not sure about the exact number. For example, A is supervised by B, and B is supervised by C, and C is supervised by D, etc.
I assume there are only like 5 or more levels for supervision. The raw data might be like this:
Employee Supervisor
A B
C B
D B
B V
E V
F E
G V
V (Blank which indicates no boss)
H A
The codes provided by some BigQuery expert is as below:
#standardSQL
SELECT t.Supervisor,
IF(t.Supervisor = t5.Supervisor,
STRUCT(Employee2 AS Employee1, NULL AS Employee2),
STRUCT(t5.Supervisor AS Employee1, Employee2 AS Employee2)
).*
FROM (
SELECT t1.Employee Supervisor,
COALESCE(t4.Employee, t3.Employee, t2.Employee) Employee2
FROM `project.dataset.table` t1
LEFT JOIN `project.dataset.table` t2 ON t2.Supervisor = t1.Employee
LEFT JOIN `project.dataset.table` t3 ON t3.Supervisor = t2.Employee
LEFT JOIN `project.dataset.table` t4 ON t4.Supervisor = t3.Employee
WHERE t1.Supervisor IS NULL
) t
LEFT JOIN `project.dataset.table` t5 ON t5.Employee = t.Employee2
and the result turned to be like this:
Row Supervisor Employee1 Employee2
1 V B A
2 V B C
3 V B D
4 V E F
5 V G null
But we want is :
Row Supervisor Employee1 Employee2 Employee3
1 V B A H
2 V B C Null
3 V B D Null
4 V E F Null
5 V G null Null
Then how to change the codes if I would like to have more levels of hierarchies? which means if I would like to add employee3, or 4, how can I edit it? Thanks!
Below is for BigQuery Standard SQL
#standardSQL
WITH e0 AS (
SELECT Employee AS Supervisor FROM `project.dataset.table` WHERE Supervisor IS NULL
), e1 AS (
SELECT e.Supervisor, Employee AS Employee1
FROM e0 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Supervisor
), e2 AS (
SELECT e.Supervisor, Employee1, Employee AS Employee2
FROM e1 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee1
), e3 AS (
SELECT e.Supervisor, Employee1, Employee2, Employee AS Employee3
FROM e2 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee2
)
SELECT * FROM e3
if to apply to sample data from your question - result/output is
Row Supervisor Employee1 Employee2 Employee3
1 V B A H
2 V B C null
3 V B D null
4 V E F null
5 V G null null
You can easily extend above adding more levels like below (replacing and with respective numbers like 4, 5, 6, 7 etc.) Obviously up to reasonable extend
e<N> AS (
SELECT e.Supervisor, Employee1, Employee2, Employee3, ... , Employee AS Employee<N>
FROM e<N-1> e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee<N-1>
)
SELECT * FROM e<N>
for example
#standardSQL
WITH e0 AS (
SELECT Employee AS Supervisor FROM `project.dataset.table` WHERE Supervisor IS NULL
), e1 AS (
SELECT e.Supervisor, Employee AS Employee1
FROM e0 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Supervisor
), e2 AS (
SELECT e.Supervisor, Employee1, Employee AS Employee2
FROM e1 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee1
), e3 AS (
SELECT e.Supervisor, Employee1, Employee2, Employee AS Employee3
FROM e2 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee2
), e4 AS (
SELECT e.Supervisor, Employee1, Employee2, Employee3, Employee AS Employee4
FROM e3 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee3
), e5 AS (
SELECT e.Supervisor, Employee1, Employee2, Employee3, Employee4, Employee AS Employee5
FROM e4 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee4
)
SELECT * FROM e5