Search code examples
pythoncamelcasing

Elegant Python function to convert CamelCase to snake_case?


Example:

>>> convert('CamelCase')
'camel_case'

Solution

  • Camel case to snake case

    import re
    
    name = 'CamelCaseName'
    name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
    print(name)  # camel_case_name
    

    If you do this many times and the above is slow, compile the regex beforehand:

    pattern = re.compile(r'(?<!^)(?=[A-Z])')
    name = pattern.sub('_', name).lower()
    

    Note that this and immediately following regex use a zero-width match, which is not handled correctly by Python 3.6 or earlier. See further below for alternatives that don't use lookahead/lookbehind if you need to support older EOL Python.

    If you want to avoid converting "HTTPHeader" into "h_t_t_p_header", you can use this variant with regex alternation:

    pattern = re.compile(r"(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])")
    name = pattern.sub('_', name).lower()
    

    See Regex101.com for test cases (that don't include final lowercase).

    You can improve readability with ?x or re.X:

    pattern = re.compile(
        r"""
            (?<=[a-z])      # preceded by lowercase
            (?=[A-Z])       # followed by uppercase
            |               #   OR
            (?<[A-Z])       # preceded by lowercase
            (?=[A-Z][a-z])  # followed by uppercase, then lowercase
        """,
        re.X,
    )
    

    If you use the regex module instead of re, you can use the more readable POSIX character classes (which are not limited to ASCII).

    pattern = re.compile(
        r"""
            (?<=[[:lower:]])            # preceded by lowercase
            (?=[[:upper:]])             # followed by uppercase
            |                           #   OR
            (?<[[:upper:]])             # preceded by lower
            (?=[[:upper:]][[:lower:]])  # followed by upper then lower
        """,
        re.X,
    )
    

    Another way to handle more advanced cases without relying on lookahead/lookbehind, using two substitution passes:

    def camel_to_snake(name):
        name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
        return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
    
    print(camel_to_snake('camel2_camel2_case'))  # camel2_camel2_case
    print(camel_to_snake('getHTTPResponseCode'))  # get_http_response_code
    print(camel_to_snake('HTTPResponseCodeXYZ'))  # http_response_code_xyz
    

    To add also cases with two underscores or more:

    def to_snake_case(name):
        name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
        name = re.sub('__([A-Z])', r'_\1', name)
        name = re.sub('([a-z0-9])([A-Z])', r'\1_\2', name)
        return name.lower()
    

    Snake case to pascal case

    name = 'snake_case_name'
    name = ''.join(word.title() for word in name.split('_'))
    print(name)  # SnakeCaseName