Search code examples
regexstringpostgresqlreplacewhitespace

PostgreSQL regexp_replace() to keep just one whitespace


I need to clean up a string column with both whitespaces and tabs included within, at the beginning or at the end of strings (it's a mess !). I want to keep just one whitespace between each word. Say we have the following string that includes every possible situation :

mystring = '  one two    three      four    '
  • 2 whitespaces before 'one'
  • 1 whitespace between 'one' and 'two'
  • 4 whitespaces between 'two' and 'three'
  • 2 tabs after 'three'
  • 1 tab after 'four'

Here is the way I do it :

  1. I delete leading and trailing whitespaces
  2. I delete leading and trailing tabs
  3. I replace both 'whitespaces repeated at least two' and tabs by a sole whitespace

WITH

  t1 AS (SELECT'  one two    three      four    '::TEXT AS mystring),

  t2 AS (SELECT TRIM(both ' ' from mystring) AS mystring FROM t1),

  t3 AS (SELECT TRIM(both '\t' from mystring) AS mystring FROM t2)

  SELECT regexp_replace(mystring, '(( ){2,}|\t+)', ' ', 'g') FROM t3 ;

I eventually get the following string, which looks nice but I still have a trailing whitespace...

'one two three four '

Any idea on doing it in a more simple way and solving this last issue ?

Many thanks !


Solution

  • SELECT trim(regexp_replace(col_name, '\s+', ' ', 'g')) as col_name FROM table_name;
    

    Or In case of update :

    UPDATE table_name SET col_name = trim(regexp_replace(col_name, '\s+', ' ', 'g'));
    

    The regexp_replace is flags are described on this section of the documentation.