Search code examples
regexpostgresqlstring-function

Functions on regexp_replace result in Postgres


I have a database of addresses where acronyms have been separated with a space, I want to remove these space so I turned to trusty regular expressions. However, I am struggling to perform a secondary function on the regexp result '\&' - I have checked the forums and docs and just cannot get this to work. Example data I have is as follows:

  • 'A V C Welding' should be 'AVC Welding'
  • 'H S B C' should be 'HSBC'
  • etc..

I have the following regexp:

trim(regexp_replace(organisation || ' ', '(([A-Z]\s){1}){2,}', replace('\&',' ',''), 'g'))

The replace('\&',' ','') is not having any effect at all, I just get the same string back. I have tried other functions e.g. lower('\&') and none of these seem to work as expected. Concatenation with || does work however. I have tried casting the '\&' to text, tried replace('' || '\&' || '',' ','') - still, no joy.

Any advice would be much appreciated, I am sure the solution is something very simple but I just cannot see where to go next!


Solution

  • What you are trying to do with \& will never work. The \& pattern will replace the entire pattern, but you need a solution that works on individual parts.

    What you need is to replace the pattern CAPITAL-space with just CAPITAL but only when followed by another capital which is not the start of a longer word. In other words: you need a negative lookahead and if the pattern is matched, then replace only the first atom of the replace string:

    select regexp_replace('A V C Welding', '([A-Z]){1}(\s){1}(?![A-Z][a-z])', '\1', 'g');
    

    You can replace the negative lookahead pattern with something broader if needed (such as no capital letter start, numbers, punctuation, etc.).