Search code examples
sql-serversql-order-byquery-optimizationunionfor-xml

How to optimize the query so the manual work can be dyanmic and in optimized way


I have below table like: SQL fiddle

I am able to get this output via XML, but I am not sure how I can get below output properly for larger number of users (approx 0.2M users).

Later I want to get top-3 Names by their counts for each id ,so RANK or OrderBy clauses will come into SQL and not sure how many iteration will it take when data is of large number of users.

Working code that I have tried:

-----------SQL Raw Table Creation------------------------
CREATE TABLE tb
(
    Id INT,
    Name VARCHAR(50) NOT NULL
);

INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'aa');
INSERT INTO tb (Id, Name) VALUES (1,  'bb');
INSERT INTO tb (Id, Name) VALUES (1,  'cc');
INSERT INTO tb (Id, Name) VALUES (1,  'cc');
INSERT INTO tb (Id, Name) VALUES (1,  'dd');
INSERT INTO tb (Id, Name) VALUES (1,  'dd');
INSERT INTO tb (Id, Name) VALUES (1,  'dd');

INSERT INTO tb (Id, Name) VALUES (2,  'aa');
INSERT INTO tb (Id, Name) VALUES (2,  'bb');
INSERT INTO tb (Id, Name) VALUES (2,  'bb');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');
INSERT INTO tb (Id, Name) VALUES (2,  'ee');

INSERT INTO tb (Id, Name) VALUES (3,  'aa');
INSERT INTO tb (Id, Name) VALUES (3,  'bb');
INSERT INTO tb (Id, Name) VALUES (3,  'cc');
INSERT INTO tb (Id, Name) VALUES (3,  'dd');
INSERT INTO tb (Id, Name) VALUES (3,  'dd');
INSERT INTO tb (Id, Name) VALUES (3,  'dd');




-----------------Want to RANK or get only top 3 rows for each Id when group by Name--------------
select f.* into #t1
from(
    select f.* 
    from(
        select f.* 
        from (
        
        select top 3 id,name,count(name) as total 
        from tb 
        where id = 1
        group by id,name 
        order by id,total desc
        )f
        Union 
    
        select top 3 id,name,count(name) as total 
        from tb 
        where id = 2
        group by id,name 
        order by id,total desc
    )f
    Union 

    select top 3 id,name,count(name) as total 
    from tb 
    where id = 3
    group by id,name 
    order by id,total desc

    ) f

/* Output is moved in temp table #t1 which looks like 
id  name    total
1   aa  5
1   cc  2
1   dd  3
2   aa  1
2   bb  2
2   ee  4
3   bb  1
3   cc  1
3   dd  3 
 */

---------Final Joining for each Top3Names and RespectiveTotal -----

select a.id as ID, a.listStr as Top3Names , b.Total as RespectiveTotal
from
    (
    SELECT  id,STUFF((SELECT  ',' + name
                FROM #t1 EE
                WHERE  EE.id=E.id
           FOR XML PATH('')), 1, 1, '') AS listStr
    FROM  #t1 E
    GROUP BY E.id
    )a
    left Join
    (
    SELECT  id,STUFF((SELECT  ',' + cast(total as Varchar)
                FROM #t1 EE
                WHERE  EE.id=E.id
           FOR XML PATH('')), 1, 1, '') AS Total
    FROM  #t1 E
    GROUP BY E.id
    )b 
    on a.id=b.id

Output:

ID  Top3Names   RespectiveTotal
1   aa,cc,dd    5,2,3
2   aa,bb,ee    1,2,4
3   bb,cc,dd    1,1,3

Here I am using UNION for each ID, which is not correct way of doing. I want an optimized way. Also I am using a temp table to store my results. Is it a good way? Let me know for any correct solution or alternatives so that I can test it on larger.


Solution

  • on my SQL SERVER machine for given sample data, your query stats looks like this:

    • Total Logical Reads: 13
    • Total CPU Time: 00:00:00.007

    if you are using SQL SERVER 2017+ you can use STRING_AGG function :

    SELECT
        id
        , STRING_AGG(name,',') WITHIN GROUP (order by name asc) Top3Names  
        , STRING_AGG(countx,',')  WITHIN GROUP (order by name asc) RespectiveTotal
    FROM (
        SELECT 
          id 
          ,  name 
          , count(*) countx 
          , ROW_NUMBER() over (partition by id order by count(*)  desc) rownumber
        FROM tb
        GROUP BY name, id
        ) result1
    WHERE 
        result1.rownumber < 4
    GROUP BY id 
    

    Stats are like :

    • Total Logical Reads: 1
    • Total CPU Time: 00:00:00.000

    for SQL SERVER 2016- :

    select id
        ,  STUFF((
           SELECT ',' + t1.Name
             FROM cte t1
            WHERE t1.id = t2.id 
            and t1.rownumber <  4
            ORDER BY t1.name
              FOR XML PATH('')), 1, LEN(','), '') AS Top3Names
    
        , STUFF((
           SELECT ',' + cast(t1.countx as varchar(50))
             FROM cte t1
            WHERE t1.id = t2.id --and t1.name = t2.name 
            and t1.rownumber <  4
            ORDER BY t1.name
              FOR XML PATH('')), 1, LEN(','), '') AS RespectiveTotal
    from cte t2
    group by id
    

    Stats are like :

    • Total Logical Reads: 7
    • Total CPU Time: 00:00:00.006

    so regardless of sql server version, It will improve performance, you will get the best performance if you are using sql server 2017 or above using query above.