Search code examples
google-bigqueryhierarchy

How to display multiple hierarchical tree structure in BigQuery


I am working on the tree hierarchy of supervisors and their supervised employees. The difficulty is that some supervisors are employees supervised by other supervisors, and there are lots of it.

For SQL queries I acquired from class, only about simple self-joins, which might be only like two levels: A is supervised by B, and that's it.

But the issue from the real world is far more complicated. There are multiple levels, and I am not sure about the exact number. For example, A is supervised by B, and B is supervised by C, and C is supervised by D, etc.

I assume there are only like 5 or more levels for supervision. The raw data might be like this:

     Employee     Supervisor
        A             B
        C             B
        D             B
        B             V
        E             V
        F             E
        G             V
        V          (Blank which indicates no boss)
        H             A

The codes provided by some BigQuery expert is as below:

#standardSQL
SELECT t.Supervisor,
  IF(t.Supervisor = t5.Supervisor, 
    STRUCT(Employee2 AS Employee1, NULL AS Employee2),
    STRUCT(t5.Supervisor AS Employee1, Employee2 AS Employee2)
  ).*
FROM (
  SELECT t1.Employee Supervisor,
    COALESCE(t4.Employee, t3.Employee, t2.Employee) Employee2
  FROM `project.dataset.table` t1
  LEFT JOIN `project.dataset.table` t2 ON t2.Supervisor = t1.Employee
  LEFT JOIN `project.dataset.table` t3 ON t3.Supervisor = t2.Employee
  LEFT JOIN `project.dataset.table` t4 ON t4.Supervisor = t3.Employee
  WHERE t1.Supervisor IS NULL
) t
LEFT JOIN `project.dataset.table` t5 ON t5.Employee = t.Employee2

and the result turned to be like this:

Row Supervisor  Employee1   Employee2    
1   V           B           A    
2   V           B           C    
3   V           B           D    
4   V           E           F    
5   V           G           null  

But we want is :

  Row Supervisor  Employee1   Employee2  Employee3  
    1   V           B           A          H
    2   V           B           C         Null
    3   V           B           D         Null
    4   V           E           F         Null
    5   V           G           null      Null

Then how to change the codes if I would like to have more levels of hierarchies? which means if I would like to add employee3, or 4, how can I edit it? Thanks!


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    WITH e0 AS (
      SELECT Employee AS Supervisor FROM `project.dataset.table` WHERE Supervisor IS NULL
    ), e1 AS (
      SELECT e.Supervisor, Employee AS Employee1 
      FROM e0 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Supervisor
    ), e2 AS (
      SELECT e.Supervisor, Employee1, Employee AS Employee2
      FROM e1 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee1
    ), e3 AS (
      SELECT e.Supervisor, Employee1, Employee2, Employee AS Employee3
      FROM e2 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee2
    )
    SELECT * FROM e3   
    

    if to apply to sample data from your question - result/output is

    Row Supervisor  Employee1   Employee2   Employee3    
    1   V           B           A           H    
    2   V           B           C           null     
    3   V           B           D           null     
    4   V           E           F           null     
    5   V           G           null        null       
    

    You can easily extend above adding more levels like below (replacing and with respective numbers like 4, 5, 6, 7 etc.) Obviously up to reasonable extend

    e<N> AS (
      SELECT e.Supervisor, Employee1, Employee2, Employee3, ... , Employee AS Employee<N>
      FROM e<N-1> e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee<N-1>
    )   
    SELECT * FROM e<N>     
    

    for example

    #standardSQL
    WITH e0 AS (
      SELECT Employee AS Supervisor FROM `project.dataset.table` WHERE Supervisor IS NULL
    ), e1 AS (
      SELECT e.Supervisor, Employee AS Employee1 
      FROM e0 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Supervisor
    ), e2 AS (
      SELECT e.Supervisor, Employee1, Employee AS Employee2
      FROM e1 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee1
    ), e3 AS (
      SELECT e.Supervisor, Employee1, Employee2, Employee AS Employee3
      FROM e2 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee2
    ), e4 AS (
      SELECT e.Supervisor, Employee1, Employee2, Employee3, Employee AS Employee4
      FROM e3 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee3
    ), e5 AS (
      SELECT e.Supervisor, Employee1, Employee2, Employee3, Employee4, Employee AS Employee5
      FROM e4 e LEFT JOIN `project.dataset.table` t ON t.Supervisor = e.Employee4
    )
    SELECT * FROM e5