Search code examples
sqlsql-servert-sqlcoalesce

Finding the first non-null in a partition


I am using SQL Server. I have a database which has a person and year (the combination creates uniqueness) where another column (we'll call it married status) has null values. I want to impute these nulls. I figure that since this column usually doesn't change often, I will take the next non-null value for that person, or if it's at the end of the data, the previous non-null value. For instance:

Person Year MaritalStatus
Moe 2001 NULL
Moe 2002 NULL
Moe 2003 Married
Larry 2001 Single
Larry 2002 NULL
Larry 2003 NULL
Curly 2001 Single
Curly 2002 NULL
Curly 2003 Married

Moe's nulls should be changed to Married, Larry's nulls should be changed to single, and Curly's null should be changed to Married.

My thought was to use coalesce with over like so (with similar logic to choose preceding nulls):

select
    Person,
    Year,
    coalesce(MaritalStatus) over (partition by Person order by Year rows between current row and unbounded following)
from mytable

It appears over doesn't work with coalesce. Is there some easy way to do this without CTE's or subqueries (I'm trying to avoid this if possible as it will make it a little harder to understand for the next person).

Edit: Based on Tim's answer I think I have something:

cte AS (
    SELECT 
        *, 
        ROW_NUMBER() OVER 
            (PARTITION BY Person,
            CASE WHEN MaritalStatus IS NULL THEN 0 ELSE 1 END
            ORDER BY Year DESC) rn
    FROM mytable
),

cte2 as (
SELECT 
    t1.Person,
    t1.Year,
    max(t2.rn) as maxrn,
    min(t3.rn) as minrn
FROM mytable t1
LEFT JOIN cte t2
    ON t2.Person = t1.Person AND
       t2.MaritalStatus IS NOT NULL and
       t1.year<t2.year
LEFT JOIN cte t3
    ON t3.Person = t1.Person AND
       t3.MaritalStatus IS NOT NULL and
       t1.year>t3.year
group by t1.Person,t1.Year
),

cte3 as(
    select
        t1.person,
        t1.year,
        coalesce(t1.maritalstatus,t4.maritalstatus,t3.maritalstatus) as maritalstatus
    from mytable t1
        left join cte2 t2
            on t1.person=t2.person and
            t1.year=t2.year
        left join cte t3
            on t1.person=t3.person and
            t3.maritalstatus is not null and
            t2.maxrn=t3.rn
        left join cte t4
            on t1.person=t4.person and
            t4.maritalstatus is not null and
            t2.minrn=t4.rn
            
)
select * from cte3

Solution

  • We could try the following approach. Here we are applying ROW_NUMBER with a partition on person and also a partition on whether or not the marital status value is NULL. We then use the most recent non NULL marital status value, per person, to fill in any NULL missing marital status values.

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (
                      PARTITION BY Person,
                                   CASE WHEN MaritalStatus IS NULL THEN 0 ELSE 1 END
                      ORDER BY Year DESC) rn
        FROM mytable
    )
    
    SELECT t1.Person, t1.Year, COALESCE(t1.MaritalStatus, t2.MaritalStatus) AS MaritalStatus
    FROM mytable t1
    LEFT JOIN cte t2
        ON t2.Person = t1.Person AND
           t2.MaritalStatus IS NOT NULL AND
           t2.rn = 1;
    

    screen capture from demo link below

    Demo