Search code examples
sqlsql-serversoundex

Why is SQL Server's SOUNDEX giving me inconsistent results?


Can someone provide some insight on why the SOUNDEX is giving different results when certain characters are capitalized vs lower case? From what I read online, SOUNDEX ignores case but I am getting different results and can't wrap my head around what would cause the differences. I can't mimic this example with are word like GUTTHRE.

Query                                  Result
select SOUNDEX('JESCHKE')              J200      <-- 200
select SOUNDEX(LOWER('JESCHKE'))       J220
select SOUNDEX('Jeschke')              J220
select SOUNDEX('jeschke')              J220
select SOUNDEX('JESChKE')              J220
select SOUNDEX('JESCHke')              J200      <-- for some reason capitalizing 'H' changes the result to 200
select SOUNDEX('jescHke')              J200      <-- 200

Any ideas why the values don't match?


Solution

  • So wiki states:

    "two letters with the same number separated by 'h' or 'w' are coded as a single number"

    So I think this applies when you are using a lowercase "h" hence the S is coded as a 2 and all the other characters would encode to 2 and are ignored as per:

    If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter

    I suspect that because the H is capitalised it resets this rule so that the following k is then coded as another 2.