Search code examples
sqlsql-servert-sqlazure-sql-database

How do I coalesce across rows to remove nulls in TSQL?


I am using Azure SQL Database. I have a table in the following structure, to track which people are working in various posts at various times. To provide a simplified version:

CREATE TABLE PersonWorkload
(
WorkloadID INT PRIMARY KEY IDENTITY,
Post VARCHAR(30),
Period VARCHAR(12),
PersonID INT,
GroupID INT,
Notes VARCHAR(100)
)

And here's a sample of data

INSERT INTO PersonWorkload (Post, Period, PersonID, GroupID, Notes)
    VALUES 
    ('Group Leader', 'Q1', 1, 1, NULL),
    ('Group Leader', 'Q2', 1, 1, NULL),
    ('Group Leader', 'Q3', 1, 2, NULL),
    ('Group Leader', 'Q4', 1, 2, NULL),
    ('Security', 'Q1', 3, 3, NULL),
    ('Security', 'Q2', 3, 3, NULL),
    ('Security', 'Q3', 4, 4, NULL),
    ('Security', 'Q4', 4, 4, NULL),
    ('Team member', 'Q1', 7, 5, NULL),
    ('Team member', 'Q2', 7, 5, 'Leaving programme this qtr'),
    ('Team member', 'Q3', 13, 6, NULL),
    ('Team member', 'Q4', 13, 6, NULL),
    ('Team member', 'Q1', 8, 7, NULL),
    ('Team member', 'Q2', 8, 7, NULL),
    ('Team member', 'Q3', 8, 7, NULL),
    ('Team member', 'Q4', 8, 7, NULL),
    ('Team member','Q1',9,8,'Temp posting'),
    ('Team member', 'Q2', 10, 9, NULL),
    ('Team member', 'Q3', 10, 9, NULL),
    ('Team member', 'Q4', 10, 9, NULL),
    ('Comms','Q2',11,10,NULL),
    ('Comms','Q3',11,10,NULL),
    ('Comms','Q4',11,10,NULL)

The GroupID column is used because sometimes I need to treat a set of these posts together. A new person in a post will always trigger a new groupID, but sometimes a person can be in a post for a longer period, and that period can sometimes contain more than one groupID (so if a person is in a post for a year, it might be that Jan - June have one GroupID, and July to Dec a different GroupID).

Sometimes only one person will hold a post at a time, sometimes more than one, and sometimes no one will be in a post.

I need to pivot this information around in order to show a column for each period, along with who worked in each post. It may be that there are one or more rows for each post.

I've created a SQL PIVOT query to achieve this, and I've put the results into a temporary table.

Here's an example of what the temporary table looks like:

CREATE TABLE SampleData (
    Post VARCHAR(100)
    ,GroupID INT
    ,[Q1] INT
    ,[Q2] INT
    ,[Q3] INT
    ,[Q4] INT
    )

...and the statement to populate it:

INSERT INTO SampleData

SELECT * 
FROM
    (SELECT Post, Period, PersonID, GroupID
    FROM PersonWorkload) t
PIVOT 
    (MAX(t.PersonID)
    FOR t.Period IN ([Q1],[Q2],[Q3],[Q4])
    ) pvt

If I include the groupID I end up with data from my PIVOT query which looks like this:

Post GroupID Q1 Q2 Q3 Q4
Group Leader 1 1 1 NULL NULL
Group Leader 2 NULL NULL 1 1
Security 3 3 3 NULL NULL
Security 4 NULL NULL 4 4
Team member 5 7 7 NULL NULL
Team member 6 NULL NULL 13 13
Team member 7 8 8 8 8
Team member 8 9 NULL NULL NULL
Team member 9 NULL 10 10 10
Comms 10 NULL 11 11 11

This is close to what I want - except that I want to "collapse" the rows down, to look more like this - which shows that sometimes (top row) a person is in post all year, sometimes the post is filled by someone for the first couple of periods, then someone else comes in (2nd row), sometimes multiple people hold a post at the same time (Team members) and so on.

Post Q1 Q2 Q3 Q4
Group Leader 1 1 1 1
Security 3 3 4 4
Team member 7 7 13 13
Team member 8 8 8 8
Team member 9 10 10 10
Comms NULL 11 11 11

So, the obvious thing to do is to remove the GroupID, which I'm not needing for this report anyway. But, if I do that, the data "over-collapses" (sorry - don't know how better to express it) and we lose all except the lowest person ID.

Post Q1 Q2 Q3 Q4
Comms NULL 11 11 11
Group Leader 1 1 1 1
Security 3 3 4 4
Team member 9 10 13 13

Now I've lost, for example, person IDs 7 and 8 as team members, because they're not the max personID.

Is there a way to do this? I've been working on it for 2 days now, and can't get what I need.

This is a simplified version - in practice there are about 40 posts, and the report would need to have 15 time periods.

Any help gratefully received.

Thanks


Solution

  • As mentioned by @lptr in the comments:

    You can add a row-numbering column to the pivot, to ensure that unique rows do not get merged.

    SELECT * 
    FROM (
        SELECT
          Post,
          Period,
          PersonID,
          dense_rank() over (partition by Post, Period order by GroupID, PersonID) as rn
        FROM PersonWorkload
    ) t
    PIVOT 
        (MAX(t.PersonID)
        FOR t.Period IN ([Q1],[Q2],[Q3],[Q4])
        ) pvt;
    

    db<>fiddle

    ROW_NUMBER() should also work here