Remove redundant/duplicate and keep most complete unique records

I have table:

Row	User	Given Name	Family Name	Address	ZIP
1	1	John	Smith	1 Main Rd	1000
2	1	John		1 Main Rd	1000
3	1	John			1000
4	1	John			0800
5	1	Johnny			1000
6	2	Tom	Lock	2 Main Rd	1001
7	2	Tommy		2 Main Rd	1001
8	2	Tom	L	2 Main Rd	1001
9	2				1001
10	3		Oak	4 Main Rd	1004
11	3	Tim		4 Main Rd	1004

I want to be able to remove 'redundant' records - which basically just repeats the same information. In the example above row 2 and row 3 are basically just repeating row 1 (but with additional nulls/blanks) and row 9 is repeating row 6.

So the desired output looks like:

Row	User	Given Name	Family Name	Address	ZIP
1	1	John	Smith	1 Main Rd	1000
4	1	John			0800
5	1	Johnny			1000
6	2	Tom	Lock	2 Main Rd	1001
7	2	Tommy		2 Main Rd	1001
8	2	Tom	L	2 Main Rd	1001
10	3		Oak	4 Main Rd	1004
11	3	Tim		4 Main Rd	1004

Looking to do this in either SQL Server (2014)/Spark/Python.

My initial approach was to:

count of null values of every row
self join table on User and compare every field (e.g. given name, family name etc)
tag records where the comparisons are either all invalid (e.g. nulls) and exact
remove those records tagged

It's sort of like a brute force approach so wondering if there are any better / more efficient way of doing this?

With rows 10 and 11 I want to keep both as the given name and family name are unique and not something that's been repeated which I realized my tagging probably wouldn't work.

Here is the code to create the table:

CREATE TABLE user_list (
    rown INT,
    username INT,
    given_name VARCHAR(50),
    family_name VARCHAR(50),
    address VARCHAR(50),
    zip varchar(10)
)

INSERT INTO user_list (rown, username, given_name, family_name, address, zip)
VALUES (1,1,'John', 'Smith', '1 Main Rd', '1000'),
       (2,1,'John', NULL, '1 Main Rd', '1000'),
       (3,1,'John', NULL, NULL, '1000'),
       (4,1,'John', NULL, NULL, '0800'),
       (5,1,'Johnny', NULL, NULL, '1000'),
       (6,2,'Tom', 'Lock', '2 Main Rd', '1001'),
       (7,2,'Tommy', NULL, '2 Main Rd', '1001'),
       (8,2,'Tom', 'L', '2 Main Rd', '1001'),
       (9,2,NULL, NULL, NULL, '1001'),
       (10,3,NULL, 'Oak', '4 Main Rd', '1004'),
       (11,3,'Tim', NULL, '4 Main Rd', '1004');

Solution

The following will check for matches and exclude the lesser match. In case of an exact match (same value/null combination), the lower rnum row is retrained.

WITH CTE_UserList AS (
    -- Calculate number of non-null, non-blank values for each row.
    SELECT
        UL.*,
        CASE WHEN UL.given_name        > '' THEN 1 ELSE 0 END
            + CASE WHEN UL.family_name > '' THEN 1 ELSE 0 END
            + CASE WHEN UL.address     > '' THEN 1 ELSE 0 END
            + CASE WHEN UL.zip         > '' THEN 1 ELSE 0 END
            AS ValueCount
    FROM user_list UL
)
SELECT *
FROM CTE_UserList UL
WHERE NOT EXISTS (
    -- Find equal or better match
    SELECT *
    FROM CTE_UserList UL2
    WHERE UL2.username = UL.username
    AND ( -- More values is a clear winner. If same values, pick lower rown
        UL2.ValueCount > UL.ValueCount
        OR (UL2.ValueCount = UL.ValueCount AND UL2.rown < UL.rown)
    )
    -- Match on equal or current row value null or blank
    AND (UL2.given_name  = UL.given_name  OR NULLIF(UL.given_name,  '') IS NULL)
    AND (UL2.family_name = UL.family_name OR NULLIF(UL.family_name, '') IS NULL)
    AND (UL2.address     = UL.address     OR NULLIF(UL.address,     '') IS NULL)
    AND (UL2.zip         = UL.zip         OR NULLIF(UL.zip,         '') IS NULL)
)
ORDER BY UL.username, UL.rown

Results (with extra data before duplicate elimination)

rown	username	given_name	family_name	address	zip	ValueCount	CoveredByRow
1	1	John	Smith	1 Main Rd	1000	4	null
2	1	John	null	1 Main Rd	1000	3	1
3	1	John	null	null	1000	2	1
4	1	John	null	null	0800	2	null
5	1	Johnny	null	null	1000	2	null
6	2	Tom	Lock	2 Main Rd	1001	4	null
7	2	Tommy	null	2 Main Rd	1001	3	null
8	2	Tom	L	2 Main Rd	1001	4	null
9	2	null	null	null	1001	1	6
10	3	null	Oak	4 Main Rd	1004	3	null
11	3	Tim	null	4 Main Rd	1004	3	null
12	4	null	Beam	4 Main Rd	1004	3	14
13	4	Jim	null	4 Main Rd	1004	3	14
14	4	Jim	Beam	4 Main Rd	1004	4	null
15	5	Exact	Match	null	null	2	null
16	5	Exact	Match	null	null	2	15

See this db<>fiddle with some extra data showing a few more partial and one exact match.