Search code examples
sqlsql-serverhierarchydenormalized

Find Table Records That Don't Match Top Down Hierarchy Structure


I have a denormalized table that I'm attempting to evaluate for hierarchical issues. The correct format is a top down hierarchy structure for a large organization. Conceptually, everyone should have just one reporting manager, but since it's denormalized, this isn't always the case. I want to find all of the records where a particular employee has more than one reporting manager. Here's some example data:

LVL1_First_Name|LVL1_Last_Name|LVL1_Emp_ID|LVL2_First_Name|LVL2_Last_Name|LVL2_Emp_ID|LVL3_First_Name|LVL3_Last_Name|LVL3_Emp_ID
---------------+--------------+-----------+---------------+--------------+-----------+---------------+--------------+-----------
Jack           |Gorshon       |0001       |Henry          |Johnson       |0003       |Amy            |Taylor        |0005
Robert         |Harris        |0002       |Henry          |Johnson       |0003       |Nick           |Greer         |0006
Robert         |Harris        |0002       |Jim            |Balial        |0004       |Josh           |Wolfe         |0007

As you can see, there is an issue with the hierarchy as Henry Johnson reports to both Robert and Jack. I'm looking to create a SQL statement that would pull out the first two records of this table because they don't follow a true top down hierarchy.

I've attempted to solve this with a group by ... having statement but it isn't getting me the results I need.

Note: I'm not in charge of this data format. My goal is to sanitize it and clean it up before putting it into a proper, normalized data structure.


Solution

  • Hmmm . . . you can unpivot the data to normalize it and then aggregate:

    select v.emp_id, min(lev), max(lev)
    from t cross apply 
         (values (LVL1_Emp_ID, LVL2_Emp_ID, 2), (LVL1_Emp_ID, LVL3_Emp_ID, 3)
         ) v(parent_emp_id, emp_id, lev)
    from v.emp_id
    group by v.emp_id
    having min(parent_emp_id) <> max(parent_emp_id);