I have a simple SQLite database where data (names) is added with a C# application. The names usually get copied and pasted from .pdf files. I found out that sometimes copying a name from .pdf generates some weird symbols. During browsing data with SQLite DB Browser I saw that some records in my database have things mingled in between like 'DC3', 'FS', 'US' and so on:
This messes with 'WHERE' clause in my queries, for example the following query would yield 0 results:
SELECT Id FROM tblPerson WHERE Name = 'Alex Denelgo';
Can someone explain what these symbols are and how can I write query to find all the "corrupted" name records? I can't go one by one manually with browser since the data already contains thousands of different names.
It seems these symbols are Non printable ASCII control characters.
The way I found the "corrupted" records is using regex. If you have the same problem as me you can use the following query to find these kinds of records. I am selecting all records minus records that only contain letters from a-z, space and dot you can modify the regex for your case of course:
SELECT Name FROM tblPerson
EXCEPT
SELECT Name FROM tblPerson WHERE Name REGEXP "^[A-Za-z .]+$";