I am using SQL Server. I have a database which has a person and year (the combination creates uniqueness) where another column (we'll call it married status) has null values. I want to impute these nulls. I figure that since this column usually doesn't change often, I will take the next non-null value for that person, or if it's at the end of the data, the previous non-null value. For instance:
Person | Year | MaritalStatus |
---|---|---|
Moe | 2001 | NULL |
Moe | 2002 | NULL |
Moe | 2003 | Married |
Larry | 2001 | Single |
Larry | 2002 | NULL |
Larry | 2003 | NULL |
Curly | 2001 | Single |
Curly | 2002 | NULL |
Curly | 2003 | Married |
Moe's nulls should be changed to Married, Larry's nulls should be changed to single, and Curly's null should be changed to Married.
My thought was to use coalesce with over like so (with similar logic to choose preceding nulls):
select
Person,
Year,
coalesce(MaritalStatus) over (partition by Person order by Year rows between current row and unbounded following)
from mytable
It appears over doesn't work with coalesce
. Is there some easy way to do this without CTE's or subqueries (I'm trying to avoid this if possible as it will make it a little harder to understand for the next person).
Edit: Based on Tim's answer I think I have something:
cte AS (
SELECT
*,
ROW_NUMBER() OVER
(PARTITION BY Person,
CASE WHEN MaritalStatus IS NULL THEN 0 ELSE 1 END
ORDER BY Year DESC) rn
FROM mytable
),
cte2 as (
SELECT
t1.Person,
t1.Year,
max(t2.rn) as maxrn,
min(t3.rn) as minrn
FROM mytable t1
LEFT JOIN cte t2
ON t2.Person = t1.Person AND
t2.MaritalStatus IS NOT NULL and
t1.year<t2.year
LEFT JOIN cte t3
ON t3.Person = t1.Person AND
t3.MaritalStatus IS NOT NULL and
t1.year>t3.year
group by t1.Person,t1.Year
),
cte3 as(
select
t1.person,
t1.year,
coalesce(t1.maritalstatus,t4.maritalstatus,t3.maritalstatus) as maritalstatus
from mytable t1
left join cte2 t2
on t1.person=t2.person and
t1.year=t2.year
left join cte t3
on t1.person=t3.person and
t3.maritalstatus is not null and
t2.maxrn=t3.rn
left join cte t4
on t1.person=t4.person and
t4.maritalstatus is not null and
t2.minrn=t4.rn
)
select * from cte3
We could try the following approach. Here we are applying ROW_NUMBER
with a partition on person and also a partition on whether or not the marital status value is NULL
. We then use the most recent non NULL
marital status value, per person, to fill in any NULL
missing marital status values.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY Person,
CASE WHEN MaritalStatus IS NULL THEN 0 ELSE 1 END
ORDER BY Year DESC) rn
FROM mytable
)
SELECT t1.Person, t1.Year, COALESCE(t1.MaritalStatus, t2.MaritalStatus) AS MaritalStatus
FROM mytable t1
LEFT JOIN cte t2
ON t2.Person = t1.Person AND
t2.MaritalStatus IS NOT NULL AND
t2.rn = 1;