Search code examples
pythonregexstringsubstring

Regex: capture specific string with conditions


Im trying to just capture the following string: u00 because i need to replace it to \u00.

Sometimes this characters appear with a \ before, in that case, i don't want to capture it. At other times, the simbol is ", i want to capture it, but just the u00, not "u00

Im trying this:

file_modified = re.sub(r'[^\\|^\s](u00)', r'\\u00', original_file)

Im capturing the " and i don't know how to skip it, i just want to capture u00


Solution

  • Just match it optionally:

    file_modified = re.sub(r'\\?u00', r'\\u00', original_file)
    

    Here,

    • \\?u00 - matches an optional \ and u00
    • \\u00 - is a replacement pattern that replaces with \u00

    Thus, even if there was a \ before u00, it won't disappear and won't get doubled, but if it was missing, it will be added.

    See the Python demo:

    import re
    original_file = r"u00 because i need to replace it to \u00"
    print(re.sub(r'\\?u00', r'\\u00', original_file))
    # => \u00 because i need to replace it to \u00