I'd like to keep together initials (max two letters) when there are punctuation or spaces in between.
I have the following snippet to tackle almost everything, but I am having issues in keeping together initials that are separated by punctuation and space. For instance, this is working on regular regex, but not in postgresql:
SELECT regexp_replace('R Z ELEMENTARY SCHOOL',
'(\b[A-Za-z]{1,2}\b)\s+\W*(?=[a-zA-Z]{1,2}\b)',
'\1')
The outcome should be "RZ ELEMENTARY SCHOOL". Other examples will include:
A & D ALTERNATIVE EDUCATION
J. & H. KNOWLEDGE DEVELOPMENT
A. - Z. EVOLUTION IN EDUCATION
The transformation should be as follows:
AD ALTERNATIVE EDUCATION
JH KNOWLEDGE DEVELOPMENT
AZ EVOLUTION IN EDUCATION
How to achieve this in Postgresql?
Thanks
Building on your current regex, I can recommend
SELECT REGEXP_REPLACE(
REGEXP_REPLACE('J. & H. KNOWLEDGE DEVELOPMENT', '\m([[:alpha:]]{1,2})\M\s*\W*(?=[[:alpha:]]{1,2}\M)', '\1'),
'^([[:alpha:]]+)\W+',
'\1 '
)
See the online demo, yielding
regexp_replace
1 JH KNOWLEDGE DEVELOPMENT
It is a two step solution. The first regex matches
\m([[:alpha:]]{1,2})\M
- a whole one or two letter words captured into Group 1 (\m
is a leading word boundar, and \M
is a trailing word boundary)\s*
- zero or more whitespaces\W*
- zero or more non-word chars(?=[[:alpha:]]{1,2}\M)
- a positive lookahead that requires a whole one or two letter word immediately to the right of the current position.The match is replaced with the contents of Group 1 (\1
).
The second regex matches a letter word at the start of the string and replaces all non-word chars after it with a space.