Search code examples
regexoracle-databaseuppercasecamelcasing

Convert CamelCase to underscore_case


I'm trying to convert column value from CamelCase to underscore_case, this is what I tried :

SELECT regexp_replace(ColumnName, '([A-Z])', '_\1', 2) FROM Table;

but this will return d_a_r_k_C_y_a_n for the value darkCyan.

How can I resolve this ?


Solution

  • This question can be tricky, possibly even a can of worms, because camel case can have some edge cases which are hard to handle. Consider the following camel case string, which happens to have an all uppercase acronym in the middle of it:

    myIBMComputerIsSlow
    

    If we want to treat each letter in the acronym IBM as a separate word, then the answer given by @wiktor covers this problem well. We would then end up with the following:

    my_I_B_M_Computer_Is_Slow
    

    If, on the other hand, we want to preserve the acronym, then we have to do more work:

    SELECT
        REGEXP_REPLACE(REGEXP_REPLACE ('myIBMComputerIsSlow', '([A-Z])([A-Z][a-z])',
                                       '\1_\2', 1, 0, 'c'),
                      '([a-z])([A-Z])', '\1_\2', 1, 0, 'c') AS output
    FROM dual
    

    This would output the following:

    my_IBM_Computer_Is_Slow
    

    In this case, we can insert an underscore break in the string under either of the two following conditions:

    • a lowercase letter is immediately followed by an uppercase letter
    • an uppercase letter is followed by upper, followed by lower

    Note that the second case is how we detect that an acronym is ending and a new camel case word is beginning.

    If you wanted the entire output in lowercase, which is what your question seems to imply, then you could add another step and use the LOWER function.

    Demo