I'm running a series of SQL queries to find data that needs cleaning up. One of them I want to do is look for:
For example my name should be "John Doe". I would want it to find "JOhn Doe" or "JOHN DOE" or "John doe", but I would not want it to find "John Doe" since that is formatted correctly.
I am using SQL Server 2008.
The key is to use a case-sensitive collation, i.e. Latin1_General_BIN
*. You can then use a query with a LIKE
expression like the following (SQL Fiddle demo):
select *
from foo
where name like '%[A-Z][A-Z]%' collate Latin1_General_BIN --two uppercase in a row
or name like '% [a-z]%' collate Latin1_General_BIN --space then lowercase
*As per How do I perform a case-sensitive search using LIKE?, apparently there is a "bug" in the Latin1_General_CS_AS
collation where ranges like [A-Z]
fail to be case sensitive. The solution is to use Latin1_General_BIN
.