Search code examples
sqlsql-serverfor-xml-pathstuff

How can i avoid a multiple using stuff for xml path?


I'm having a problem with sql query. What i'm trying to do is to get list of stores with tasks assigned on users in it. Each role in separated column with user names. I'm using STUFF from getting concatenated row but it looks like bad idea for 100K+ rows in tables.

This is simplified structure:

Users table

CREATE TABLE #temp_Users(
    [id] [int] IDENTITY(1,1) NOT NULL,  
    [user_name] [nvarchar](250) NULL,
 CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([id] ASC)
 )

 insert into #temp_Users (user_name) values ('Joe'),('Jeff'),('Jimm')

Tasks table

create table #temp_Tasks (
    [id] [int] IDENTITY(1,1) NOT NULL,
    [id_user] [int] NULL,
    [id_store] [int] NULL,
    [id_role] [int] NULL,
    CONSTRAINT [PK_Tasks] PRIMARY KEY CLUSTERED  ([id] ASC)
) 

insert into #temp_Tasks ([id_user],[id_store],[id_role])
       values (1,1,0),(1,2,0),(2,1,0),(2,2,0),(1,1,1),(2,2,1),(3,1,0),(3,2,0),(3,2,1)

Select

SELECT distinct t.id_store,
  stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 0 FOR XML PATH('')),1,2,'' ) as 'role_0',
  stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 1 FOR XML PATH('')),1,2,'' )  as 'role_1' 
FROM #temp_Tasks t

Result

enter image description here

The problem is that select is getting slower and slower when tables filling more and more because every STUFF is another nested loop. And if i need to add some "roles" in this select i have to add another STUFF like following:

SELECT distinct t.id_store,
  stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 0 FOR XML PATH('')),1,2,'' ) as 'role_0',
  stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 1 FOR XML PATH('')),1,2,'' )  as 'role_1' 
  stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 2 FOR XML PATH('')),1,2,'' )  as 'role_2' 
  stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 3 FOR XML PATH('')),1,2,'' )  as 'role_3' 
  stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 4 FOR XML PATH('')),1,2,'' )  as 'role_4' 
FROM #temp_Tasks t

The question is can i avoid this multiple STUFF functions using? Or maybe i need to create nonclustered indexes on tables Tasks? Maybe filtered indexes each one for every role? Thanks in advance!

EDIT: i'm using MSSQL 2016.

As Gordon Linoff suggested i've changed Select from Distinct to Group By like this:

SELECT t.id_store,
      stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 0 FOR XML PATH('')),1,2,'' ) as 'role_0',
      stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 left outer join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 1 FOR XML PATH('')),1,2,'' )  as 'role_1' 
    FROM #temp_Tasks t
group by t.id_store

and execution time decreases from 20 sec to 2 sec. So the problem was not about Stuff but Distinct.


Solution

  • Since SQL Server 2017, you use string_agg()?

    SELECT t.id_store,
           STRING_AGG(CASE WHEN t.id_role = 0 THEN u.user_name END, ',') WITHIN GROUP (ORDER BY u.user_name) as role_0,
           STRING_AGG(CASE WHEN t.id_role = 1 THEN u.user_name END, ',') WITHIN GROUP (ORDER BY u.user_name) as role_1
    FROM #temp_Tasks t JOIN
         #temp_Users u
         ON u.id = t2.id_users
    GROUP BY t.id_store;
    

    In older versions, you might benefit from using GROUP BY instead of SELECT DISTINCT as Aaron Bertrand explains. I would also replace the LEFT JOINs in the subqueries with INNER JOIN:

    SELECT t.id_store,
           stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 0 FOR XML PATH('')),1,2,'' ) as 'role_0',
           stuff( (SELECT ', ' + u.[user_name] FROM #temp_Tasks t2 join #temp_Users u on u.id = t2.id_user WHERE t2.id_store = t.id_store and t2.id_role = 1 FOR XML PATH('')),1,2,'' )  as 'role_1' 
    FROM #temp_Tasks t
    GROUP BY t.id_store;