Using Azure SQL Server, I have a table that store an organization's tree structure in a single table that looks like this:
CREATE TABLE [dbo].[UserManagement_GroupDef]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[PortalId] [bigint] NOT NULL,
[GroupType] [int] NOT NULL,
[ParentGroupId] [bigint] NULL,
[GroupName] [nvarchar](2048) NOT NULL,
[IsDefault] [bit] NOT NULL,
[Email] [nvarchar](256) NULL,
[PhoneNumber] [varchar](10) NULL,
[UserManagementId] [nvarchar](450) NULL,
[Address] [nvarchar](450) NULL,
[Suite] [nvarchar](450) NULL,
[City] [nvarchar](450) NULL,
[State] [nvarchar](450) NULL,
[Position] [nvarchar](450) NULL,
[Zip] [nvarchar](450) NULL,
[IsDeleted] [bit] NOT NULL,
CONSTRAINT [PK_UserManagement_GroupDef] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
All columns are indexed.
[ParentGroupId]
references another row in the same table. The table, all in all, represents a tree structure.
In the query, below, I am attempting to find all descendent children of group ID 155
.
WITH tblChild AS
(
SELECT
ID
,ParentGroupId
,[IsDeleted]
,[PortalId]
,[GroupType]
,[GroupName]
,[IsDefault]
,[Email]
,[PhoneNumber]
,[UserManagementId]
,[Address]
,[Suite]
,[City]
,[State]
,[Position]
,[Zip]
FROM UserManagement_GroupDef
WHERE ID = 155 AND IsDeleted = 0
UNION ALL
SELECT
UserManagement_GroupDef.ID,
UserManagement_GroupDef.ParentGroupId,
UserManagement_GroupDef.[IsDeleted],
UserManagement_GroupDef.[PortalId],
UserManagement_GroupDef.[GroupType],
UserManagement_GroupDef.[GroupName],
UserManagement_GroupDef.[IsDefault],
UserManagement_GroupDef.[Email],
UserManagement_GroupDef.[PhoneNumber],
UserManagement_GroupDef.[UserManagementId],
UserManagement_GroupDef.[Address],
UserManagement_GroupDef.[Suite],
UserManagement_GroupDef.[City],
UserManagement_GroupDef.[State],
UserManagement_GroupDef.[Position],
UserManagement_GroupDef.[Zip]
FROM UserManagement_GroupDef
JOIN
tblChild
ON
UserManagement_GroupDef.ParentGroupId = tblChild.ID
WHERE tblChild.IsDeleted = 0
)
SELECT
tblChild.ID,
tblChild.ParentGroupId,
tblChild.[IsDeleted],
tblChild.[PortalId],
tblChild.[GroupType],
tblChild.[GroupName],
tblChild.[IsDefault],
tblChild.[Email],
tblChild.[PhoneNumber],
tblChild.[UserManagementId],
tblChild.[Address],
tblChild.[Suite],
tblChild.[City],
tblChild.[State],
tblChild.[Position],
tblChild.[Zip]
FROM tblChild
The table currently has just under 7000 records. This particular query returns a result with just under 2000 rows. The result is ultimately correct but takes ~20 seconds to execute.
Is there a way to speed this query up or to achieve the same result, only faster?
Execution plan:
https://www.brentozar.com/pastetheplan/?id=S1k0Qdrzd
Table Indices:
CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef] ON [dbo].[UserManagement_GroupDef]
(
[IsDeleted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef_2] ON [dbo].[UserManagement_GroupDef]
(
[ParentGroupId] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef_3] ON [dbo].[UserManagement_GroupDef]
(
[IsDeleted] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [umg_query_index] ON [dbo].[UserManagement_GroupDef]
(
[ParentGroupId] ASC
)
INCLUDE(
[ID],
[IsDeleted],
[PortalId],
[GroupType],
[GroupName],
[IsDefault],
[Email],
[PhoneNumber],
[UserManagementId],
[Address],
[Suite],
[City],
[State],
[Position],
[Zip]
)
WHERE ([IsDeleted]=(0))
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
It looks like what is going on is that the compiler is not able to push down the outer predicate IsDeleted = 0
into the CTE. It therefore cannot use the umg_query_index
as recommended by O. Jones.
Why that is the case, is because each run of the recursive section could return rows which are IsDeleted = 1
, which need to feed back into the next run. While it is true that they do not appear in the final result, it is still possible that such a row may have child rows that do need to appear in the final resultset. So there is no way to eliminate them from the recursive join.
You have two options:
IX_UserManagement_GroupDef_2
index, so that it also contains the IsDeleted
column, it does not need to be in the key, it can be an INCLUDE
:CREATE NONCLUSTERED INDEX [IX_UserManagement_GroupDef_2] ON [dbo].[UserManagement_GroupDef]
([ParentGroupId] ASC) INCLUDE (IsDeleted)
WITH (DROP_EXISTING = ON, ONLINE = ON) ON [PRIMARY]
WITH tblChild AS
(
SELECT *
FROM UserManagement_GroupDef
WHERE ID = 155 AND IsDeleted = 0
UNION ALL
SELECT
u.*
FROM
UserManagement_GroupDef u
JOIN
tblChild ON u.ParentGroupId = tblChild.ID
WHERE u.IsDeleted = 0
)
SELECT *
FROM tblChild
This query has different semantics from your original one. It will not return any rows for which the parent is IsDeleted = 1
.
At this point you could also change your new index to be a filtered one, as this means any IsDeleted
rows will not be stored.
CREATE NONCLUSTERED INDEX umg_query_index ON UserManagement_GroupDef
(ParentGroupId) INCLUDE (IsDeleted) WHERE IsDeleted = 0
WITH (DROP_EXISTING = ON, ONLINE = ON) ON [PRIMARY];
You must include the IsDeleted
column in the index, otherwise the optimizer may fail to use it, due to faulty logic as currently implemented.
One further note: single-column indexes are not usually that useful and should generally be avoided. It is not worth it for the server to try union two indexes together, it will scan the clustered index instead.
I suggest you drop the indexes IX_UserManagement_GroupDef
and IX_UserManagement_GroupDef_3
as they are now completely superfluous.