Search code examples
regexoracle-databasecapitalizationregexp-replace

Initcap with some restrictions


I want to capitalize the every initial letter of a full name, but with a exception: the prepositions should be lowercase.

The prepositions are: ["da", "de", "di", "do", "du", "das", "des", "dis", "dos", "dus"].

I have this code so far, but it's horrible and incomplete. It will be so huge if I do this way (putting all the exceptions I need).

DBMS_OUTPUT.PUT_LINE(
  REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE
    (INITCAP ('ronald DAS silva'), '(D|d)a', 'da'),
    '(D|d)o', 'do'), '(D|d)e', 'de'),'(D|d)o', 'do'),'(D|d)u', 'du'));

I already tried to use REGEXP_REPLACE(with this regex -> ^(d|D)[a-zA-Z]{1,2}$) inside the INITCAP function, but I didn't get success.

So, is there a way to do it easier?

EDIT:

I have this at the moment:

DBMS_OUTPUT.PUT_LINE(REGEXP_REPLACE(INITCAP('ronald DAS silva'), '([d|D][[:alpha:]]{1,2})', LOWER('\1')));

It's almost working, but the LOWER function for some reason is not doing his work, why? If I put LOWER('A') it works.

Edit 2 - Additional tests:

Every name that starts with the letter "d" will be transformed to lowercase, and it should not happen.

Then there are some cases:

  • Daniel
  • Deivid
  • Dijkstra
  • Donald
  • Duoling

Full code

DECLARE
  TYPE t_name IS VARRAY(3) OF VARCHAR2(100);
  v_names t_name := t_name('    Donald     dIs    siLvA',
                           'daniEl    da sIlvA XaVIeR   ',
                           '   DeYse De Olivier dA     loPeS');
BEGIN
  DBMS_OUTPUT.PUT_LINE(RPAD('Name w/o format', 60, ' ') ||
                            'Name formatted');
  DBMS_OUTPUT.PUT_LINE(RPAD('---------------------', 60, ' ') ||
                            '---------------------');

  FOR i IN 1..v_names.COUNT LOOP
    DBMS_OUTPUT.PUT_LINE(RPAD(v_names(i), 60, ' ') ||
                          f_format(v_names(i)));
  END LOOP;
END;

FUNCTION f_format(p_str VARCHAR2)
   RETURN VARCHAR2
AS
BEGIN
RETURN REGEXP_REPLACE(
         REGEXP_REPLACE(
           INITCAP(p_str),
         '((D)([aeiou](s|$)?))', 'd\3'),
       '[[:space:]]+', ' ');       
END;

Solution

  • I think I've got it. Initcap the whole string, then look for your pattern surrounded by spaces (remembered group 1). Within that, break it apart so group 2 is the leading capital 'D', group 3 is the rest including the optional 's'. Replace with a space, lowercase 'd', remembered group 3 and another space. I would have to advise against using this in production though, as it is way too specific to this format of a name.

    SQL> declare
         name varchar2(20) := 'ronald dis silva';
       begin
         dbms_output.put_line(
           REGEXP_REPLACE(INITCAP(name), '( (D)([aeiou](s|$)?) )', ' d\3 '));
       end;
       /
    Ronald dis Silva
    
    SQL>
    

    A name starting with a 'd' works for me:

    SQL> declare
      2       name varchar2(20) := 'donald dis silva';
      3     begin
      4       dbms_output.put_line(
      5         REGEXP_REPLACE(INITCAP(name), '( (D)([aeiou](s|$)?) )', ' d\3 '));
      6     end;
      7  /
    Donald dis Silva
    

    Example with some more complex names using a WITH clause to make it easier to plug in different names for testing:

    SQL> with tbl(name) as (
          select '    Donald     dIs    siLvA'      from dual union
          select 'daniEl    da sIlvA XaVIeR   '     from dual union
          select '   DeYse De Olivier dA     loPeS' from dual
        )
        select REGEXP_REPLACE(INITCAP(REGEXP_REPLACE(trim(name), '\s+', ' ')), '( (D)([aeiou](s|$)?) )', ' d\3 ') newname
        from tbl;
    
    NEWNAME
    --------------------------------------------------------------------------------
    Donald dis Silva
    Deyse de Olivier da Lopes
    Daniel da Silva Xavier
    
    SQL>