Search code examples
sql-servert-sqlazure-sql-databasequery-optimization

Tree Structure Query in SQL Server


Using Azure SQL Server, I have a table that store an organization's tree structure in a single table that looks like this:

    CREATE TABLE [dbo].[UserManagement_GroupDef]
    (
        [ID] [bigint] IDENTITY(1,1) NOT NULL,
        [PortalId] [bigint] NOT NULL,
        [GroupType] [int] NOT NULL,
        [ParentGroupId] [bigint] NULL,
        [GroupName] [nvarchar](2048) NOT NULL,
        [IsDefault] [bit] NOT NULL,
        [Email] [nvarchar](256) NULL,
        [PhoneNumber] [varchar](10) NULL,
        [UserManagementId] [nvarchar](450) NULL,
        [Address] [nvarchar](450) NULL,
        [Suite] [nvarchar](450) NULL,
        [City] [nvarchar](450) NULL,
        [State] [nvarchar](450) NULL,
        [Position] [nvarchar](450) NULL,
        [Zip] [nvarchar](450) NULL,
        [IsDeleted] [bit] NOT NULL,
CONSTRAINT [PK_UserManagement_GroupDef] PRIMARY KEY CLUSTERED     
   (
    [ID] ASC
   )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
   ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

All columns are indexed.

[ParentGroupId] references another row in the same table. The table, all in all, represents a tree structure.

In the query, below, I am attempting to find all descendent children of group ID 155.

WITH tblChild AS
    (
        SELECT 
        ID
        ,ParentGroupId
        ,[IsDeleted]
        ,[PortalId]
        ,[GroupType]
        ,[GroupName]
        ,[IsDefault]
        ,[Email]
        ,[PhoneNumber]
        ,[UserManagementId]
        ,[Address]
        ,[Suite]
        ,[City]
        ,[State]
        ,[Position]
        ,[Zip]
            FROM UserManagement_GroupDef    
            WHERE ID = 155 AND IsDeleted = 0
        UNION ALL
        SELECT 
             UserManagement_GroupDef.ID,
             UserManagement_GroupDef.ParentGroupId, 
             UserManagement_GroupDef.[IsDeleted],
             UserManagement_GroupDef.[PortalId],
             UserManagement_GroupDef.[GroupType],
             UserManagement_GroupDef.[GroupName],
             UserManagement_GroupDef.[IsDefault],
             UserManagement_GroupDef.[Email],
             UserManagement_GroupDef.[PhoneNumber],
             UserManagement_GroupDef.[UserManagementId],
             UserManagement_GroupDef.[Address],
             UserManagement_GroupDef.[Suite],
             UserManagement_GroupDef.[City],
             UserManagement_GroupDef.[State],
             UserManagement_GroupDef.[Position],
             UserManagement_GroupDef.[Zip]

        FROM UserManagement_GroupDef  
        JOIN 
            tblChild  
                ON 
                    UserManagement_GroupDef.ParentGroupId = tblChild.ID
                WHERE tblChild.IsDeleted = 0        
    )

    SELECT 
            tblChild.ID,
             tblChild.ParentGroupId, 
             tblChild.[IsDeleted],
             tblChild.[PortalId],
             tblChild.[GroupType],
             tblChild.[GroupName],
             tblChild.[IsDefault],
             tblChild.[Email],
             tblChild.[PhoneNumber],
             tblChild.[UserManagementId],
             tblChild.[Address],
             tblChild.[Suite],
             tblChild.[City],
             tblChild.[State],
             tblChild.[Position],
             tblChild.[Zip]
        FROM tblChild       
    

The table currently has just under 7000 records. This particular query returns a result with just under 2000 rows. The result is ultimately correct but takes ~20 seconds to execute.

Is there a way to speed this query up or to achieve the same result, only faster?

Execution plan:

https://www.brentozar.com/pastetheplan/?id=S1k0Qdrzd

Table Indices:

CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef] ON [dbo].[UserManagement_GroupDef]
(
    [IsDeleted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef_2] ON [dbo].[UserManagement_GroupDef]
(
    [ParentGroupId] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef_3] ON [dbo].[UserManagement_GroupDef]
(
    [IsDeleted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [umg_query_index] ON [dbo].[UserManagement_GroupDef]
(
    [ParentGroupId] ASC
)
INCLUDE(

[ID],
[IsDeleted],
[PortalId],
[GroupType],
[GroupName],
[IsDefault],
[Email],
[PhoneNumber],
[UserManagementId],
[Address],
[Suite],
[City],
[State],
[Position],
[Zip]
)

WHERE ([IsDeleted]=(0))

WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

Solution

  • It looks like what is going on is that the compiler is not able to push down the outer predicate IsDeleted = 0 into the CTE. It therefore cannot use the umg_query_index as recommended by O. Jones.

    Why that is the case, is because each run of the recursive section could return rows which are IsDeleted = 1, which need to feed back into the next run. While it is true that they do not appear in the final result, it is still possible that such a row may have child rows that do need to appear in the final resultset. So there is no way to eliminate them from the recursive join.

    You have two options:

    1. Change your IX_UserManagement_GroupDef_2 index, so that it also contains the IsDeleted column, it does not need to be in the key, it can be an INCLUDE:
    CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef_2] ON [dbo].[UserManagement_GroupDef]
    ([ParentGroupId] ASC) INCLUDE (IsDeleted)
    WITH (DROP_EXISTING = ON, ONLINE = ON) ON [PRIMARY]
    
    1. Probably the better option, push down the predicate yourself. Then it will use O. Jones' index on both parts of the CTE.
    WITH tblChild AS
    (
        SELECT *
        FROM UserManagement_GroupDef 
        WHERE ID = 155 AND IsDeleted = 0                
    
        UNION ALL
    
        SELECT 
            u.* 
        FROM 
            UserManagement_GroupDef u
        JOIN 
            tblChild ON u.ParentGroupId = tblChild.ID
        WHERE u.IsDeleted = 0
    )
    SELECT *
    FROM tblChild 
    

    This query has different semantics from your original one. It will not return any rows for which the parent is IsDeleted = 1.

    At this point you could also change your new index to be a filtered one, as this means any IsDeleted rows will not be stored.

    CREATE NONCLUSTERED INDEX umg_query_index ON UserManagement_GroupDef
    (ParentGroupId)  INCLUDE (IsDeleted) WHERE IsDeleted = 0
    WITH (DROP_EXISTING = ON, ONLINE = ON) ON [PRIMARY];
    

    You must include the IsDeleted column in the index, otherwise the optimizer may fail to use it, due to faulty logic as currently implemented.


    One further note: single-column indexes are not usually that useful and should generally be avoided. It is not worth it for the server to try union two indexes together, it will scan the clustered index instead.

    I suggest you drop the indexes IX_UserManagement_GroupDef and IX_UserManagement_GroupDef_3 as they are now completely superfluous.