Search code examples
sql-servergroup-byrow-number

ms sql: How to use Dense_Rank in conjunction with a Group by


I have tried to find some records with a common table expression and in conjunction with a group by to find duplicateNo and subDuplicateNo of a specific table. Its all about finding duplicates from a table with a notification of duplicateNo and subDuplicateNo. But I'm wondering that subDuplicateNo always are 1. But how can I increment the column subDuplicateNo? I have until now searched a lot without any success :-(

My code is as below:

WITH cte AS (
    SELECT t.Upper_Fornavn_FirstPart, t.Upper_Efternavn, Upper_Adresse, t.Telefon, Upper_postnr, Count(*) AS occurences,
    ROW_NUMBER() OVER (ORDER BY t.Upper_Fornavn_FirstPart, t.Upper_Efternavn, Upper_Adresse, t.Telefon, Upper_postnr) as dubletno, t.subdubletno
    FROM (
        SELECT V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, MA2.Upper_Adresse, Telefon, Upper_postnr,
            DENSE_RANK() OVER (PARTITION BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, MA2.Upper_Adresse, Telefon, Upper_postnr ORDER BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, MA2.Upper_Adresse, Telefon, Upper_postnr) AS subdubletno
        FROM Medlemsdata V2
        LEFT OUTER JOIN Postnumre
        RIGHT OUTER JOIN MedlemsAdresse MA2 ON Postnumre.Postnummer = MA2.Postnr ON V2.FK_AdrID = MA2.AdrID
        WHERE V2.vennenr > 0 AND V2.Upper_Fornavn_FirstPart <> '' AND V2.Upper_Efternavn <> '' AND Upper_Adresse <> '' AND V2.Telefon <> '' AND Upper_postnr <> ''

    ) as t
    GROUP BY t.Upper_Fornavn_FirstPart, Upper_Efternavn, Upper_Adresse, Telefon, Upper_postnr, t.subdubletno
    HAVING Count(*) > 1
)

SELECT V1.Vennenr, V1.Upper_Fornavn_FirstPart, V1.Upper_Efternavn, MA1.Upper_Adresse, V1.Telefon, MA1.Upper_postnr, cte.dubletno, cte.subdubletno
FROM 
    Medlemsdata V1
    LEFT OUTER JOIN Postnumre
    RIGHT OUTER JOIN MedlemsAdresse MA1 ON Postnumre.Postnummer = MA1.Postnr ON V1.FK_AdrID = MA1.AdrID
    INNER JOIN cte ON 
        cte.Upper_Fornavn_FirstPart = V1.Upper_Fornavn_FirstPart AND 
        cte.Upper_Efternavn = v1.Upper_Efternavn AND
        cte.Upper_Adresse = MA1.Upper_Adresse AND
        cte.Upper_postnr = MA1.Upper_postnr AND
        cte.Telefon = V1.Telefon
WHERE V1.vennenr > 0 and cte.occurences > 1

and the actual and the expected data are as follows:

Actual result:

ID       Upper_Fornavn_FirstPart  UPPER_EFTERNAVN  UPPER_ADRESSE  TELEFON   UPPER_POSTNR  DubletNo  SubDubletNo
===============================================================================================================
1        ANNE                     JENSEN           BIRKEHAVEN 2   12345678  7100          1         1
2        ANNE                     JENSEN           BIRKEHAVEN 2   12345678  7100          1         1
3        TOM                      HANSEN           SLOTSGADE 1    98765432  6000          2         1
4        TOM                      HANSEN           SLOTSGADE 1    98765432  6000          2         1

---------------------------------------------------------------------------------------------------------------
Expected result:

ID       Upper_Fornavn_FirstPart  UPPER_EFTERNAVN  UPPER_ADRESSE  TELEFON   UPPER_POSTNR  DubletNo  SubDubletNo
===============================================================================================================
1        ANNE                     JENSEN           BIRKEHAVEN 2   12345678  7100          1         1
2        ANNE                     JENSEN           BIRKEHAVEN 2   12345678  7100          1         2
3        TOM                      HANSEN           SLOTSGADE 1    98765432  6000          2         1
4        TOM                      HANSEN           SLOTSGADE 1    98765432  6000          2         2

Thanks in advance for any kind of help.
I really hope that this technically can be done ...
Regards

Michael

Update: I have tried the following code - but now it gives me 1 in all records of both duplicateNo and subDuplicateNo :-(

WITH cte AS (
    SELECT t.Upper_Fornavn_FirstPart, t.Upper_Efternavn, Upper_Adresse, t.Telefon, Upper_postnr, t.dubletno, t.subdubletno
    FROM (
        SELECT V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, MA2.Upper_Adresse, Telefon, Upper_postnr,
            COUNT(*) OVER (PARTITION BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, Upper_Adresse, v2.Telefon, Upper_postnr) as occurences,
            DENSE_RANK() OVER (PARTITION BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, MA2.Upper_Adresse, Telefon, Upper_postnr ORDER BY Upper_postnr) AS dubletno,
            ROW_NUMBER() OVER (PARTITION BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, MA2.Upper_Adresse, Telefon, Upper_postnr ORDER BY Upper_postnr) AS subdubletno
        FROM Medlemsdata V2
        LEFT OUTER JOIN Postnumre
        RIGHT OUTER JOIN MedlemsAdresse MA2 ON Postnumre.Postnummer = MA2.Postnr ON V2.FK_AdrID = MA2.AdrID
        WHERE V2.vennenr > 0 AND V2.Upper_Fornavn_FirstPart <> '' AND V2.Upper_Efternavn <> '' AND Upper_Adresse <> '' AND V2.Telefon <> '' AND Upper_postnr <> ''
        GROUP BY Upper_Fornavn_FirstPart, Upper_Efternavn, Upper_Adresse, Telefon, Upper_postnr
        HAVING Count(*) > 1
    ) as t
)

SELECT V1.Vennenr, V1.Upper_Fornavn_FirstPart, V1.Upper_Efternavn, MA1.Upper_Adresse, V1.Telefon, MA1.Upper_postnr, cte.dubletno, cte.subdubletno
FROM 
    Medlemsdata V1
    LEFT OUTER JOIN Postnumre
    RIGHT OUTER JOIN MedlemsAdresse MA1 ON Postnumre.Postnummer = MA1.Postnr ON V1.FK_AdrID = MA1.AdrID
    INNER JOIN cte ON 
        cte.Upper_Fornavn_FirstPart = V1.Upper_Fornavn_FirstPart AND 
        cte.Upper_Efternavn = v1.Upper_Efternavn AND
        cte.Upper_Adresse = MA1.Upper_Adresse AND
        cte.Upper_postnr = MA1.Upper_postnr AND
        cte.Telefon = V1.Telefon
WHERE V1.vennenr > 0

Solution

  • This query appears to be doing a lot of extra work. I also suspect your right join construction could be simplified.

    SELECT t.*
    FROM (
        SELECT V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, V2.Upper_Adresse, Telefon, Upper_postnr,
            COUNT(*) OVER (PARTITION BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, V2.Upper_Adresse, Telefon, Upper_postnr) as occurences,
            DENSE_RANK() OVER (ORDER BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, V2.Upper_Adresse, Telefon, Upper_postnr) as dubletno,
            ROW_NUMBER() OVER (PARTITION BY V2.Upper_Fornavn_FirstPart, V2.Upper_Efternavn, V2.Upper_Adresse, Telefon, Upper_postnr ORDER BY Upper_postnr) AS subdubletno
        FROM data V2
        WHERE V2.vennenr > 0 AND V2.Upper_Fornavn_FirstPart <> '' AND V2.Upper_Efternavn <> '' AND Upper_Adresse <> '' AND V2.Telefon <> '' AND Upper_postnr <> ''
    ) as t
    WHERE t.occurences > 0;