Search code examples
sql-serversql-server-2008-r2soundex

Find SOUNDEX of each word in the column


I have the following data:

create table testing
(
name varchar(100)
);

insert into testing values('Mr.Alex James Henrry');
insert into testing values('Mr John Desto');
insert into testing values('Ms.Lisa Jack Jerry Han');
insert into testing values('Smith White');
insert into testing values('Rowny James Duest');

Note: I want to find soundex value of each word in the name field except English Honorifics (Mr, Ms etc).

Expected Result:

name                    name_soundex
-------------------------------------
Mr.Alex James Henrry    A420 J520 H560
Mr John Desto           J500 D230
Ms.Lisa Jack Jerry Han  L200 J200 J600 H500
Smith White             S530 W300
Rowny James Duest       R500 J520 D230

Tried:

Adding column to store soundex:

alter table testing
add name_soundex varchar(500);

Update:

update testing
set name_soundex = SOUNDEX(name)

Getting following output:

name                    name_soundex
-------------------------------------
Mr.Alex James Henrry    M600
Mr John Desto           M600
Ms.Lisa Jack Jerry Han  M200
Smith White             S530
Rowny James Duest       R500

Solution

  • You need split the name(s) into their respective parts and the "remerge" them. SQL Server 2008 (which is almost entirely out of support, so you should be looking at your upgrade plans) doesn't have a splitter built in. SQL Server 2016+ does, however, it doesn't provide ordinal position; therefore I have used DelimitedSplit8K (A google will find this). If you are using 2012+, I would recommend DelimitedSplit8K_LEAD (even on 2016+, as ordinal position is important):

    WITH VTE AS(
        SELECT *
        FROM (VALUES('Mr.Alex James Henrry'),
                    ('Mr John Desto'),
                    ('Ms.Lisa Jack Jerry Han'),
                    ('Smith White'),
                    ('Rowny James Duest')) V([Name]))
    SELECT [name],
           STUFF((SELECT ' ' + SOUNDEX(DS.item)
                  FROM dbo.DelimitedSplit8K(REPLACE([name],'.',' '),' ') DS
                  WHERE DS.item NOT IN ('Mr','Mrs','Miss','...') --You know what your acceptable titles are
                                                                 --Although, seeing as you have both "Mr {name}" and Mr.{name}", maybe not :/
                  ORDER BY DS.itemnumber
                  FOR XML PATH('')),1,1,'') AS name_soundex
    FROM VTE;