Handle initials in Postgresql

I'd like to keep together initials (max two letters) when there are punctuation or spaces in between.

I have the following snippet to tackle almost everything, but I am having issues in keeping together initials that are separated by punctuation and space. For instance, this is working on regular regex, but not in postgresql:

SELECT regexp_replace('R Z ELEMENTARY SCHOOL', 
                      '(\b[A-Za-z]{1,2}\b)\s+\W*(?=[a-zA-Z]{1,2}\b)', 
                      '\1')

The outcome should be "RZ ELEMENTARY SCHOOL". Other examples will include:

A & D ALTERNATIVE EDUCATION
J. & H. KNOWLEDGE DEVELOPMENT
A. - Z. EVOLUTION IN EDUCATION

The transformation should be as follows:

AD ALTERNATIVE EDUCATION
JH KNOWLEDGE DEVELOPMENT
AZ EVOLUTION IN EDUCATION

How to achieve this in Postgresql?

Thanks

Solution

Building on your current regex, I can recommend

SELECT REGEXP_REPLACE(
    REGEXP_REPLACE('J. & H. KNOWLEDGE DEVELOPMENT', '\m([[:alpha:]]{1,2})\M\s*\W*(?=[[:alpha:]]{1,2}\M)', '\1'),
    '^([[:alpha:]]+)\W+',
    '\1 '
)

See the online demo, yielding

    regexp_replace
1   JH KNOWLEDGE DEVELOPMENT

It is a two step solution. The first regex matches

\m([[:alpha:]]{1,2})\M - a whole one or two letter words captured into Group 1 (\m is a leading word boundar, and \M is a trailing word boundary)
\s* - zero or more whitespaces
\W* - zero or more non-word chars
(?=[[:alpha:]]{1,2}\M) - a positive lookahead that requires a whole one or two letter word immediately to the right of the current position.

The match is replaced with the contents of Group 1 (\1).

The second regex matches a letter word at the start of the string and replaces all non-word chars after it with a space.