Search code examples
sql-servert-sqlpartition-by

T-SQL "partition by" results not as expected


What I'm trying to do is get a total count of "EmailAddresses" via using partitioning logic. As you can see in the result set spreadsheet, the first record is correct - this particular email address exists 109 times. But, the second record, same email address, the numberOfEmailAddresses column shows 108. And so on - just keeps incrementing downward by 1 on the same email address. Clearly, I'm not writing this SQL right and I was hoping to get some feedback as to what I might be doing wrong.

What I would like to see is the number 109 consistently down the column numberOfEmailAddresses for this particular email address. What might I be doing wrong?

Here's my code:

select
    Q1.SubscriberKey,
    Q1.EmailAddress,
    Q1.numberOfEmailAddresses
from
    (select
        sub.SubscriberKey as SubscriberKey,
        sub.EmailAddress as EmailAddress,
        count(*) over (partition by sub.EmailAddress order by sub.SubscriberKey asc) as numberOfEmailAddresses
    from    
        ent._Subscribers sub) Q1

And here's my result set, ordered by "numberOfEmailAddresses": enter image description here


Solution

  • select distinct
        Q1.SubscriberKey,
        Q1.EmailAddress,
    (select count(*) from ent._Subscribers sub where sub.EmailAddress = Q1.EmailAddress) as numberOfEmailAddress
    from ent._Subscribers Q1
    

    will get you what you want. I think the inclusion of the order by in your partition function is what is causing the descending count. Ordering in a partition function further subdivides the partition as I understand it.

    select
        Q1.SubscriberKey,
        Q1.EmailAddress,
        Q1.numberOfEmailAddresses
    from
        (select
            sub.SubscriberKey as SubscriberKey,
            sub.EmailAddress as EmailAddress,
            count(*) over (partition by sub.EmailAddress) as numberOfEmailAddresses
        from    
            ent._Subscribers sub) Q1
    

    May also work but I can't find a suitable dataset to test.