I am trying to check database columns in PostgreSQL for special characters, white space (more than one space), extra tabs, and carriage returns. The idea for me is to find special characters such as (e.g., +," ",`, /,', etc.). I want to count how many occurrences there are of these values. I am fairly new to PostgreSQL and regular expressions.
Example,
roadmap/plan
Smith, Tom (2 spaces after Tom)
applicant's
see me (tab)
Q4 2024
FY2024+
So far I have tried the following SQL to find these values.
select col1 from table_name
where col1 ~* '[^A-Za-z0-9&\(\)\| \''.-\\s*\s*|\s+y\s+]+';
However, I am getting back records I do not care about like if there is a . character.
Example,
Dr. Tom Smith
I have also tried:
select col1 from table_name
where col1 ~* '[^A-Za-z0-9^\\''\-#+~_\t\r]';
I still get back records I don't care about:
Example,
Q4 2024
Tom Smith
My criteria is find:
What am I missing or need to take out of the regex?
Spaces are allowed, so put them in your character class, but add a check for doubled spaces:
where col1 ~* '[^A-Za-z0-9^\\''\-#+~_\t\r\n ]|\\s\\s+';