Search code examples
sql-serversql-order-bycommon-table-expressionrow-number

How to ORDER (ROW_NUMBER() OVER PARTITION BY) BY non-selected column


I have a query as:

WITH Top_Areas AS
(
SELECT AIP.aid [Author_ID], CAST(P.research_area as VARCHAR(100)) [Research_Area], 
       P.research_area_category_id [Category],
       COUNT(*) [Paper_Count],  
       P.p_year [Paper_Year]
FROM coauthor_individual_papers AIP
JOIN sub_aminer_paper P ON AIP.pid = P.pid
WHERE AIP.aid IN (901706) AND AIP.p_year IN (2005)
GROUP BY
    AIP.aid, CAST(P.research_area as VARCHAR(100)),
    P.research_area_category_id, 
    P.p_year
),
Top_Author_Areas AS
(SELECT *, ROW_NUMBER() OVER(PARTITION BY Author_ID, Paper_Year ORDER BY Paper_Count DESC, Research_Area ASC) AS Rank FROM Top_Areas)

SELECT * 
FROM CTE1
WHERE Rank <= 5  

which gives output as:

Author_ID   Research_Area   Category    Paper_Count Paper_Year  Rank  
--------------------------------------------------------------------
901706      feature         8           1           2005        1
901706      image annotate  11          1           2005        2
901706      image db        11          1           2005        3
901706      retrieval model 12          1           2005        4  

This output is just ORDERING BY Research_Area i.e. ASC whereas I also need ORDERING BY Paper_Count i.e. DESC, whereas the desired output should be like:

Author_ID   Category    Paper_Count Paper_Year  Rank  
----------------------------------------------------
901706      11          2           2005        1
901706      8           1           2005        2
901706      12          1           2005        3  

It is possible when we not to select Research_Area in SELECT list, but if we not select Research_Area, then how to ORDER BY Research_Area ASC, whereas we want to use ORDER BY Research_Area ASC option when Paper_Count value is same as shown in desired output above i.e.
Category 11 is at Rank 1 because of highest value of Paper_Count,
Category 8 and 11 are at Rank 2 and 3 respectively because of following reasons:

  • Paper_Count value is same.
  • So have to ORDER Rank 2 and 3 as by Research_Area ASC

How can we get desired output?


Solution

  • Since you still don't seem to get my point about there being possibly different Research_Area values when we're trying to implement your tie-breaking rule, I'll decide arbitrarily - we'll use the earliest Research_Area value within each Category to do the tie-breaking:

    WITH Top_Areas AS
    (
    SELECT AIP.aid [Author_ID],
    
      MIN(CAST(P.research_area as VARCHAR(100))) [Research_Area],
    
           P.research_area_category_id [Category],
           COUNT(*) [Paper_Count],  
           P.p_year [Paper_Year]
    FROM coauthor_individual_papers AIP
    JOIN sub_aminer_paper P ON AIP.pid = P.pid
    WHERE AIP.aid IN (901706) AND AIP.p_year IN (2005)
    GROUP BY
        AIP.aid,
    
        --CAST(P.research_area as VARCHAR(100)),
    
        P.research_area_category_id, 
        P.p_year
    ),
    Top_Author_Areas AS
    (SELECT *, ROW_NUMBER() OVER(PARTITION BY Author_ID, Paper_Year
                                 ORDER BY Paper_Count DESC, Research_Area ASC) AS Rank
     FROM Top_Areas)
    
    SELECT * 
    FROM CTE1
    WHERE Rank <= 5
    

    Changed areas are on lines separated from rest of the query by blank lines - I'm just applying MIN to Research_Area in order to obtain a single value for use in the tie-breaking, and have removed it from the GROUP BY, both changes within the Top_Areas CTE.