Search code examples
sqlgoogle-bigqueryrecursive-queryself-join

How to display self-joined tree hierarchy table by using BigQuery?


I am working on the tree hierarchy of supervisors and their supervised employees. The difficulty is that some supervisors are employees supervised by other supervisors, and there are lots of it.

For SQL queries I acquired from class, only about simple self-joins, which might be only like two levels: A is supervised by B, and that's it.

But the issue from the real world is far more complicated. There are multiple levels, and I am not sure about the exact number. For example, A is supervised by B, and B is supervised by C, and C is supervised by D, etc.

I assume there are only like 4 levels for supervision. The raw data might be like this:

         Employee     Supervisor
            A             B
            C             B
            D             B
            B             V
            E             V
            F             E
            G             V
            V          (Blank which indicates no boss)

And thus I generated the codes like below:

SELECT T1.Supervisor,ifnull(T2.EmployeeName, 'No Boss') as sup_emp1, 
ifnull(T3.EmployeeName, 'No Boss') as sup_emp2, ifnull(T4.EmployeeName, 
'No boss') as sup_emp3, ifnull(T5.EmployeeName, 'No Boss') as EMPLOYEE
FROM  T1
LEFT JOIN T2
ON T1.Supervisor=T2.EmployeeName
LEFT JOIN  T3
ON T2.Supervisor = T3.EmployeeName
LEFT JOIN T4
ON T3.Supervisor = T4.EmployeeName
LEFT JOIN T5
ON T4.Supervisor = T5.EmployeeName
WHERE T5. Emp_Status = 'Active'
ORDER BY T1.Supervisor ASC

However, the results turned out to be wrong. I just got so confused about the tables the relationship and failed to think of the recursive joins.

The expected result should be like this:

Supervisor     # of Employees     
   V              7(include indirect supervised employees)
   B              3
   E              1

And the organization table should be:

Supervisor   Employee1    Employee2
   V             B             A
   V             B             C
   V             B             D
   V             E             F
   V             G            Null

But my result turned out to be

Supervisor   Employee1    Employee2
   V             B             A
   V             B             C
   V             B             D
   V             E             F
   V             V             G  

<- I need the G to be on the second level.

So, can anyone help? Thanks!


Solution

  • ... assume there are only like 4 levels for supervision ...

    Below is for BigQuery Standard SQL and easily extended to more (reasonably more) levels just by adding more LEFT JOINs and respective fields into COALESCE

    #standardSQL
    SELECT t.Supervisor,
      IF(t.Supervisor = t5.Supervisor, 
        STRUCT(Employee2 AS Employee1, NULL AS Employee2),
        STRUCT(t5.Supervisor AS Employee1, Employee2 AS Employee2)
      ).*
    FROM (
      SELECT t1.Employee Supervisor,
        COALESCE(t4.Employee, t3.Employee, t2.Employee) Employee2
      FROM `project.dataset.table` t1
      LEFT JOIN `project.dataset.table` t2 ON t2.Supervisor = t1.Employee
      LEFT JOIN `project.dataset.table` t3 ON t3.Supervisor = t2.Employee
      LEFT JOIN `project.dataset.table` t4 ON t4.Supervisor = t3.Employee
      WHERE t1.Supervisor IS NULL
    ) t
    LEFT JOIN `project.dataset.table` t5 ON t5.Employee = t.Employee2
    

    If to apply to sample data from your question - result is

    Row Supervisor  Employee1   Employee2    
    1   V           B           A    
    2   V           B           C    
    3   V           B           D    
    4   V           E           F    
    5   V           G           null