I'm using PATINDEX
to validate if a column has invalid special chars. But I'm facing some problems with some chars.
SELECT PATINDEX(N'%[^a-zA-Z0-9 !"&''()*+,-./:;?=%~@[]_{}\|<>]%' collate SQL_Latin1_General_CP850_BIN, 'abc╢123' collate SQL_Latin1_General_CP850_BIN)
The valid chars are ! " & ' ( ) * + , - . / : ; ? = % ~ @ [ ] _ { } \ | < >
.
The ╢
is not one of the valid chars but PATINDEX still returns 0.
Is there another way to validate that?
Escaping is one method.
But if you just want to ignore the readable ASCII characters, then the range could be simplified.
[^ -~]
: not between space and ~
-- Sample data
declare @T table (col NVARCHAR(30) collate SQL_Latin1_General_CP850_BIN primary key);
insert into @T (col) values
(N'abc╢123'),
(N'xyz123[}'''),
(N'abc௹123');
-- Query
SELECT col, PATINDEX(N'%[^ -~]%' collate SQL_Latin1_General_CP850_BIN, col) as pos
FROM @T;
Returns:
col pos
-------- ----
abc╢123 4
abc௹123 4
xyz123[}' 0
But to also locate the caret and some others, it's more complicated.
Since PATINDEX doesn't have ESCAPE as LIKE does.
-- Sample data
declare @T table (
id int identity(1,1) primary key,
col NVARCHAR(30) collate SQL_Latin1_General_CP850_BIN
);
insert into @T (col) values
(N'xyz[123]}''') -- good
,(N'abc╢123') -- bad
,(N'abc௹123') -- bad
,(N'def#456') -- bad
,(N'def^456') -- bad
;
-- also locate #, ´ , ` and ^
SELECT col,
CASE
WHEN PATINDEX(N'%[^ !"$-_a-z{-~]%' collate SQL_Latin1_General_CP850_BIN, col) > 0
THEN PATINDEX(N'%[^ !"$-_a-z{-~]%' collate SQL_Latin1_General_CP850_BIN, col)
ELSE CHARINDEX(N'^' collate SQL_Latin1_General_CP850_BIN, col)
END AS pos
FROM @T;
Returns:
xyz[123]}' 0
abc╢123 4
abc௹123 4
def#456 4
def^456 4