I have a database of addresses where acronyms have been separated with a space, I want to remove these space so I turned to trusty regular expressions. However, I am struggling to perform a secondary function on the regexp result '\&' - I have checked the forums and docs and just cannot get this to work. Example data I have is as follows:
I have the following regexp:
trim(regexp_replace(organisation || ' ', '(([A-Z]\s){1}){2,}', replace('\&',' ',''), 'g'))
The replace('\&',' ','')
is not having any effect at all, I just get the same string back. I have tried other functions e.g. lower('\&')
and none of these seem to work as expected. Concatenation with || does work however. I have tried casting the '\&' to text, tried replace('' || '\&' || '',' ','')
- still, no joy.
Any advice would be much appreciated, I am sure the solution is something very simple but I just cannot see where to go next!
What you are trying to do with \&
will never work. The \&
pattern will replace the entire pattern, but you need a solution that works on individual parts.
What you need is to replace the pattern CAPITAL-space
with just CAPITAL
but only when followed by another capital which is not the start of a longer word. In other words: you need a negative lookahead and if the pattern is matched, then replace only the first atom of the replace string:
select regexp_replace('A V C Welding', '([A-Z]){1}(\s){1}(?![A-Z][a-z])', '\1', 'g');
You can replace the negative lookahead pattern with something broader if needed (such as no capital letter start, numbers, punctuation, etc.).