Search code examples
sql-serversql-server-2008string-parsingthai

Parse Thai Name into First Last


I need to parse a list of FullNames into First and Last Name. If a middle name is included, it should be included in the fist name field.

John Smith would be:

FirstName = John

LastName = Smith

John J. Smith would be:

FirstName = John J.

LastName = Smith

The issue is the names might be either Thai or English character set. I need to properly parse either set. I have tried just about everything...

DECLARE @FullName NVARCHAR(MAX) = N'กล้วยไม้ สวามิวัศดุ์'
--DECLARE @FullName NVARCHAR(MAX) = N'Mark C. Wilson'

SELECT 
    LEN(@FullName) AS StringLength,
    LEN(@FullName) - LEN(REPLACE(@FullName,N' ', N'')),
    LEN(REPLACE(@FullName,N' ', N'')),
    @FullName AS FullName,
    REVERSE(@FullName) AS ReverseName, -- This is obviously no Reverse of the string
    CHARINDEX(N' ', REVERSE(@FullName)) AS LastSpaceLocation,
    CHARINDEX(N' ', @FullName) AS FirstSpaceLocation,
    LEN(@FullName) AS LenString,
    STUFF(@FullName, 1, CHARINDEX(N' ', @FullName), N'') as FirstName,
    RIGHT(@FullName, LEN(@FullName) - CHARINDEX(N' ', @FullName) + 1) as LastName,
    LEFT(@FullName, LEN(@FullName) - CHARINDEX(N' ', REVERSE(@FullName))) AS FirstName,
    STUFF(RIGHT(@FullName, CHARINDEX(N' ', REVERSE(@FullName))),1,1,N'') AS LastName,
    LEN(@FullName),
    REVERSE(@FullName),
    REVERSE(' '),
    LEN(@FullName) - CHARINDEX(reverse(' '), REVERSE(@FullName)) - LEN(' ') + 1

The REVERSE simply does not work when the Thai character set is used.


Solution

  • I can't read Thai (I'm not that bright), but perhaps this may help.

    Here we are using a CROSS APPLY to "fix" the string, and then it is a small matter of PasrName() and Concat()

    I should add, parsing names is a slippery slope. One needs to consider

    • Multi Word Last Names ie De la Cruz
    • Suffix ie. Richard R Cappelletti MD

    Example

    Declare @YourTable table (FullName nvarchar(100))
    Insert Into @YourTable values
     ('John Smith')
    ,('John J. Smith')
    ,(N'กล้วยไม้ สวามิวัศดุ์')
    
    Select A.*
          ,LastName  = replace(parsename(S,1),'|','.')
          ,FirstName = replace(concat(parsename(S,4),' '+parsename(S,3),' '+parsename(S,2)),'|','.')
     From  @YourTable A
     Cross Apply ( values (replace(replace(FullName,'.','|'),' ','.'))) B(S)
    

    Returns

    FullName          LastName    FirstName
    John Smith        Smith       John
    John J. Smith     Smith       John J.
    กล้วยไม้ สวามิวัศดุ์    สวามิวัศดุ์     กล้วยไม้
    

    EDIT 2008 Version

    Select A.*
          ,LastName  = replace(parsename(S,1),'|','.')
          ,FirstName = replace( IsNull(parsename(S,4),'') + IsNull(' '+parsename(S,3),'') + IsNull(' '+parsename(S,2),''),'|','.')
     From  @YourTable A
     Cross Apply ( values (replace(replace(FullName,'.','|'),' ','.'))) B(S)