I am trying to filter out "illegal" characters in a column with urls
I tried
select seourl from products where seourl like '%[^0-9a-z&;/_.?,!%-]%';
But I got a lot of seemingly ok urls selected as illegal. So I tried this to get more debug info:
declare @p nvarchar(40)='%[^0-9a-z&;/_.?,!%-]%';
select substring(seourl,patindex(@p,seourl),2) s,
patindex(@p,seourl) n,
seourl from products where
seourl like @p
It showed I got hits on strings containing 'aa', like the ones below:
AA 1 AA_Batteri__-_Mignon__1,5_v/1_stk
aa 2 Haarboerste
Could it be because the collation is Danish_Norwegian_CI_AS, where aa is alfabetized like å,å ?
It sure was.
By changing the collation, I no longer got the false positives:
declare @p nvarchar(40)='%[^0-9a-z&;/_.?,!%-]%';
select seourl from t_produkter where
CAST(seourl COLLATE SQL_Latin1_General_CP1_CI_AS as varchar(255)) like @p