Search code examples
sqlsql-serversql-server-2016

String Matching Struggle


I need to find State abbreviations in column name my_column. This column can contain values like these

John Smith of AZ --> Match

John Smith of AZ(Tucson)' --> Match

AZ John Smith --> Match

John Smith AZ for Tucson --> Match

Utah Jazz --> Don't Match

Azyme --> Don't Match

'Hazy --> Don't Match

I tried using CASE expressions with CHARINDEX and LIKE to do all this matching, but it is getting super-ugly. I wonder if there is a better way.

I asked our DBA to install Full-Text Index to see if I can do something better with CONTAINS, but not sure if it really helps.


Solution

  • How about something like

    declare @testcases table (testval varchar(50));
    
    insert into @testcases 
    values 
    ('John Smith of AZ'),
    ('John Smith of AZ(Tucson)'),
    ('AZ John Smith'),
    ('John Smith AZ for Tucson'),
    ('Utah Jazz'),
    ('Azyme'),
    ('Hazy')
    
    select PATINDEX('%[^A-Z]AZ[^A-Z]%',testval) + PATINDEX('AZ[^A-Z]%',testval) + PATINDEX('%[^A-Z]AZ',testval)
    from @testcases;
    

    It'll match anything containing AZ, that doesn't border an alpha-numeric character. Mind you, this isn't very performant, but it will work.

    Also it only works for Arizona. To find the other 49 states you'd have to scan again. I would probably try to extract the state in your application before inserting into the database.