Search code examples
regexpostgresqlspecial-characterswhitespace

PostgreSQL - how to find special characters, white space, extra tabs, and carriage returns in text fields?


I am trying to check database columns in PostgreSQL for special characters, white space (more than one space), extra tabs, and carriage returns. The idea for me is to find special characters such as (e.g., +," ",`, /,', etc.). I want to count how many occurrences there are of these values. I am fairly new to PostgreSQL and regular expressions.

Example,


roadmap/plan
Smith, Tom  (2 spaces after Tom)
applicant's
    see me (tab)
Q4  2024
FY2024+

So far I have tried the following SQL to find these values.

select col1 from  table_name
where col1  ~*  '[^A-Za-z0-9&\(\)\| \''.-\\s*\s*|\s+y\s+]+';

However, I am getting back records I do not care about like if there is a . character.

Example,

Dr. Tom Smith

I have also tried:

select col1 from  table_name
where col1  ~* '[^A-Za-z0-9^\\''\-#+~_\t\r]';

I still get back records I don't care about:

Example,

Q4 2024
Tom Smith

My criteria is find:

  • Quotation marks (Single and Double)
  • Back Ticks
  • <,>
  • Apostrophes
  • Plus signs +
  • {}
  • &
  • |
  • Extra Tabs
  • Carriage Returns
  • Spaces between words
  • Backslash or Forward Slash
  • Piping
  • Alphanumeric values are: a-z or A-Z
  • Numeric values are: 0-9

What am I missing or need to take out of the regex?


Solution

  • Spaces are allowed, so put them in your character class, but add a check for doubled spaces:

    where col1 ~* '[^A-Za-z0-9^\\''\-#+~_\t\r\n ]|\\s\\s+';