Search code examples
sqlsql-serversql-server-2016row-numberranking-functions

Properly implement DENSE_RANK() or similar function


I am trying to get an output using SQL Server 2016 that will return StudyID's that only have an AssignmentType of "Final" i.e. no AssignmentType of "Draft".

Some studies may have a "Draft" followed by a "Final". I do not want these. I purely want studyID's that have 1 AssignmentType which is "Final".

I currently get this data being returned:

StudyID ClientName DateSign SchedStart ActStart ActEnd SchedEnd AssignmentType
1 Client1 NULL 2022-01-05 NULL NULL NULL Final
2 Client2 NULL 2023-04-15 NULL NULL NULL Final
3 Client3 NULL 2024-01-10 NULL NULL NULL Draft
3 Client3 NULL 2024-01-25 NULL NULL NULL Final
4 Client4 NULL 2023-12-10 NULL NULL NULL Final
5 Client5 NULL 2023-12-31 NULL NULL NULL Draft
5 Client5 NULL 2024-01-20 NULL NULL NULL Final
6 Client6 NULL 2022-10-10 NULL NULL NULL Final
7 Client7 NULL 2023-09-15 NULL NULL NULL Final

I want the output to look like this:

StudyID ClientName DateSign SchedStart ActStart ActEnd SchedEnd AssignmentType
1 Client1 NULL 2022-01-05 NULL NULL NULL Final
2 Client2 NULL 2023-04-15 NULL NULL NULL Final
4 Client4 NULL 2023-12-10 NULL NULL NULL Final
6 Client6 NULL 2022-10-10 NULL NULL NULL Final
7 Client7 NULL 2023-09-15 NULL NULL NULL Final

My current code (below) returns about ~1500 rows when it should only end up being around 20-30 rows.

My 2 requirements are below and feel like they're just saying the same thing (I pretty much covered these above but maybe I am reading wrong/misinterpreting):

  1. The report shall show a list of studyid's that have a "Final" scheduled but to draft is scheduled to be completed or has been completed in the past.
  2. The report shall not show studies where a draft has been delivered.

I was thinking I could use one of the rank functions, then pick out only the records that return a "1" for AssignmentType but I am not quite sure how I would implement either of these in my code.

Here is my code:

SELECT DISTINCT
       S.StudyNo AS StudyID, 
       S.Status, 
       S.ProtocolSign,
       S.Client AS ClientName,  
       S.FinalRptSign as DateSign,
       DT.scheduledstartdate AS SchedStart, 
       DT.actualstartdate AS ActStart, 
       DT.actualenddate AS ActEnd,
       DT.scheduledenddate AS SchedEnd,
       DT.[comments] AS AssignmentType
FROM 
    dbo.View_RptStudyModule S 
JOIN 
    [dbo].[View_RptPhase] P ON P.studyrowid = S.StudyRowId 
JOIN 
    [dbo].[View_Proposal] VP ON VP.[StudyRowId] = S.StudyRowId
JOIN 
    [dbo].[View_Costcenter] CC ON CC.[ccid]=VP.[CCId]
JOIN 
    (SELECT 
         [datetypetxt],
         [PhaseRowId],
         [actualstartdate],
         [actualenddate],
         [scheduledstartdate],
         [scheduledenddate] = CASE WHEN scheduledenddate < projectedenddate THEN projectedenddate ELSE scheduledenddate END,
         [projectedenddate],
         [comments],
         DD.Detail,
         DD.DateDetailComments
     FROM 
         [dbo].[View_Datetracked] DT  
     JOIN 
         (SELECT 
              [DateTrackedRowId],
              [Detail],
              [DateDetailComments]
          FROM  
              [dbo].[View_RptDateDetail]) DD ON DD.DateTrackedRowId = DT.datetrackedrowid
     WHERE 
         [scheduledstartdate] IS NOT NULL 
         AND [datetypetxt] = 'Dataset SAS Files' 
         AND [actualenddate] IS NULL AND [scheduledenddate] IS NULL) DT ON DT.PhaseRowId = P.phaserowid
WHERE
    P.PhaseType = 'Dataset'  
    AND P.PhaseCostCenter IN ('1', '2', '3', '4', '5','6','7','8')
    AND DT.[scheduledstartdate] > '2021-01-01'
    AND S.FinalRptSign IS NULL
GROUP BY
    S.StudyNo, S.Status, S.ProtocolSign,
    S.Client, S.Duration, S.FinalRptSign,
    DT.scheduledstartdate, DT.actualstartdate, DT.actualenddate, DT.scheduledenddate,
    P.PhaseCostCenter,
    DT.[comments]
ORDER BY 
    S.StudyNo ASC, DT.[comments]

I have seen other posts on here explaining the differences between Row_Number and different Rank functions and for some reason it isn't clicking in my head. Would there be any difference?

I've tried this and variations of this but it just returns my current output + "Ranking" column which seems to just literally be ranking them from 1 up until ~1500:

SELECT DISTINCT
       S.StudyNo AS StudyID, 
       S.Status, 
       S.ProtocolSign,
       S.Client AS ClientName,  
       S.FinalRptSign as DateSign,
       DT.scheduledstartdate AS SchedStart, 
       DT.actualstartdate AS ActStart, 
       DT.actualenddate AS ActEnd,
       DT.scheduledenddate AS SchedEnd,
       DT.[comments] AS AssignmentType,
       DENSE_RANK() OVER(PARTITION BY DT.[comments] Order by S.StudyNo DESC) ranking
FROM dbo.View_RptStudyModule S

Any help or clarification is greatly appreciated!

EDIT:

I've added in row_number vs dense_rank and got this as the top of my script now:

SELECT DISTINCT
       S.StudyNo AS Study, 
       S.Status, 
       S.ProtocolSign,
       S.Client AS ClientName,  
       S.FinalRptSign,
       DT.scheduledstartdate AS SchedStart, 
       DT.actualstartdate AS ActStart, 
       DT.actualenddate AS ActEnd,
       DT.scheduledenddate AS SchedEnd,
       DT.[comments] AS AssignmentType,
       ROW_NUMBER() OVER (PARTITION BY S.StudyNo ORDER BY DT.[comments] DESC) AS RN
FROM dbo.View_RptStudyModule S 

My column RN now contains 1 or 2 and where 2 is present, it is a second instance of AssignmentType. Just need to figure out how to capture this and it should be it

Edit 2:

Condensed version that runs but no output

SELECT *
FROM (
    SELECT DISTINCT
       S.StudyNo as StudyID, 
       S.Client AS ClientName,  
       S.FinalRptSign AS DateSign,
       DT.scheduledstartdate AS SchedStart,
       P.PhaseCostCenter,
       DT.[comments] as AssignmentType
    FROM dbo.View_RptStudyModule S
     JOIN [dbo].[View_RptPhase] P ON P.studyrowid = S.StudyRowId 
     JOIN (SELECT [datetypetxt],
              [PhaseRowId],
              [scheduledstartdate],
              [comments] FROM [dbo].[View_Datetracked]
            WHERE [scheduledstartdate] IS NOT NULL
            AND [datetypetxt] = 'Dataset SAS Files'
            AND comments LIKE '%SEND%'
            AND [actualenddate] IS NULL) DT ON DT.PhaseRowId = P.phaserowid
    WHERE
    P.PhaseType = 'Dataset'  
    AND P.PhaseCostCenter IN ('1', '2', '3', '4', '5','6','7','8')
    AND DT.[scheduledstartdate] > '2021-01-01'
    AND S.FinalRptSign IS NULL    

    Group By
       S.StudyNo, 
       S.Client,  
       S.FinalRptSign,
       DT.scheduledstartdate,
       P.PhaseCostCenter,
       DT.[comments]
    HAVING COUNT(*) = 1
    AND MAX(DT.[comments]) = 'Final'
) SELECTED
 JOIN dbo.View_RptStudyModule S ON S.StudyNo = SELECTED.StudyNo

Solution

  • Although your question title references DENSE_RANK() it appears that your real question is, I purely want studyID's that have 1 AssignmentType which is "Final". I do not believe that DENSE_RANK() is what you need.

    To just get the studyID's, you can use a GROUP BY together with a couple of HAVING conditions to check for (1) Only one row for a given studyID and (2) AssignmentType= "Final"

    Something like:

    SELECT S.StudyNo
    FROM View_RptStudyModule S
    GROUP BY S.StudyNo
    HAVING COUNT(*) = 1
    AND MAX(S.comments) = 'Final'
    

    This can be wrapped up as a subselect and included as part of a larger query.

    If you need the complete row instead of just the ID (to avoid a separate lookup), you could also try either:

    SELECT *
    FROM View_RptStudyModule S
    WHERE S.comments = 'Final'
    AND NOT EXISTS(
        SELECT *
        FROM View_RptStudyModule S2
        WHERE S2.StudyNo = S.StudyNo
        AND S2.comments = 'Draft'
    ) 
    

    or

    SELECT *
    FROM (
        SELECT *, COUNT(*) OVER(PARTITION BY S.StudyNo) AS NumRows
        FROM View_RptStudyModule S
    ) SN
    WHERE SN.NumRows = 1
    AND SN.comments = 'Final'
    

    These should all produce the same results except for a case where a study has multiple "Final" rows and but no "Draft" rows. If your data might contain such cases, you will need to decide whether or not those rows should be included in the results.

    See this db<>fiddle for a demo. Since your original post did not include table schemas or sample data, the demo uses your original posted results as source data.