Search code examples
t-sqldesign-patternssql-like

tsql Pattern [^a-z] finds aa


I am trying to filter out "illegal" characters in a column with urls

I tried

select seourl from products where seourl like '%[^0-9a-z&;/_.?,!%-]%';

But I got a lot of seemingly ok urls selected as illegal. So I tried this to get more debug info:

declare @p nvarchar(40)='%[^0-9a-z&;/_.?,!%-]%';

select substring(seourl,patindex(@p,seourl),2) s,
patindex(@p,seourl) n,
seourl from products where 
seourl like @p

It showed I got hits on strings containing 'aa', like the ones below:

AA  1  AA_Batteri__-_Mignon__1,5_v/1_stk
aa  2  Haarboerste

Could it be because the collation is Danish_Norwegian_CI_AS, where aa is alfabetized like å,å ?


Solution

  • It sure was.

    By changing the collation, I no longer got the false positives:

    declare @p nvarchar(40)='%[^0-9a-z&;/_.?,!%-]%';
    
    select seourl from t_produkter where 
    CAST(seourl COLLATE SQL_Latin1_General_CP1_CI_AS as varchar(255)) like @p