I have below table like: SQL fiddle
I am able to get this output via XML, but I am not sure how I can get below output properly for larger number of users (approx 0.2M users).
Later I want to get top-3 Names by their counts for each id ,so RANK or OrderBy clauses will come into SQL and not sure how many iteration will it take when data is of large number of users.
Working code that I have tried:
-----------SQL Raw Table Creation------------------------
CREATE TABLE tb
(
Id INT,
Name VARCHAR(50) NOT NULL
);
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'bb');
INSERT INTO tb (Id, Name) VALUES (1, 'cc');
INSERT INTO tb (Id, Name) VALUES (1, 'cc');
INSERT INTO tb (Id, Name) VALUES (1, 'dd');
INSERT INTO tb (Id, Name) VALUES (1, 'dd');
INSERT INTO tb (Id, Name) VALUES (1, 'dd');
INSERT INTO tb (Id, Name) VALUES (2, 'aa');
INSERT INTO tb (Id, Name) VALUES (2, 'bb');
INSERT INTO tb (Id, Name) VALUES (2, 'bb');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (3, 'aa');
INSERT INTO tb (Id, Name) VALUES (3, 'bb');
INSERT INTO tb (Id, Name) VALUES (3, 'cc');
INSERT INTO tb (Id, Name) VALUES (3, 'dd');
INSERT INTO tb (Id, Name) VALUES (3, 'dd');
INSERT INTO tb (Id, Name) VALUES (3, 'dd');
-----------------Want to RANK or get only top 3 rows for each Id when group by Name--------------
select f.* into #t1
from(
select f.*
from(
select f.*
from (
select top 3 id,name,count(name) as total
from tb
where id = 1
group by id,name
order by id,total desc
)f
Union
select top 3 id,name,count(name) as total
from tb
where id = 2
group by id,name
order by id,total desc
)f
Union
select top 3 id,name,count(name) as total
from tb
where id = 3
group by id,name
order by id,total desc
) f
/* Output is moved in temp table #t1 which looks like
id name total
1 aa 5
1 cc 2
1 dd 3
2 aa 1
2 bb 2
2 ee 4
3 bb 1
3 cc 1
3 dd 3
*/
---------Final Joining for each Top3Names and RespectiveTotal -----
select a.id as ID, a.listStr as Top3Names , b.Total as RespectiveTotal
from
(
SELECT id,STUFF((SELECT ',' + name
FROM #t1 EE
WHERE EE.id=E.id
FOR XML PATH('')), 1, 1, '') AS listStr
FROM #t1 E
GROUP BY E.id
)a
left Join
(
SELECT id,STUFF((SELECT ',' + cast(total as Varchar)
FROM #t1 EE
WHERE EE.id=E.id
FOR XML PATH('')), 1, 1, '') AS Total
FROM #t1 E
GROUP BY E.id
)b
on a.id=b.id
Output:
ID Top3Names RespectiveTotal
1 aa,cc,dd 5,2,3
2 aa,bb,ee 1,2,4
3 bb,cc,dd 1,1,3
Here I am using UNION for each ID, which is not correct way of doing. I want an optimized way. Also I am using a temp table to store my results. Is it a good way? Let me know for any correct solution or alternatives so that I can test it on larger.
on my SQL SERVER machine for given sample data, your query stats looks like this:
if you are using SQL SERVER 2017+ you can use STRING_AGG
function :
SELECT
id
, STRING_AGG(name,',') WITHIN GROUP (order by name asc) Top3Names
, STRING_AGG(countx,',') WITHIN GROUP (order by name asc) RespectiveTotal
FROM (
SELECT
id
, name
, count(*) countx
, ROW_NUMBER() over (partition by id order by count(*) desc) rownumber
FROM tb
GROUP BY name, id
) result1
WHERE
result1.rownumber < 4
GROUP BY id
Stats are like :
for SQL SERVER 2016- :
select id
, STUFF((
SELECT ',' + t1.Name
FROM cte t1
WHERE t1.id = t2.id
and t1.rownumber < 4
ORDER BY t1.name
FOR XML PATH('')), 1, LEN(','), '') AS Top3Names
, STUFF((
SELECT ',' + cast(t1.countx as varchar(50))
FROM cte t1
WHERE t1.id = t2.id --and t1.name = t2.name
and t1.rownumber < 4
ORDER BY t1.name
FOR XML PATH('')), 1, LEN(','), '') AS RespectiveTotal
from cte t2
group by id
Stats are like :
so regardless of sql server version, It will improve performance, you will get the best performance if you are using sql server 2017 or above using query above.