Search code examples
pythonregexreplacebackslashslash

Replace all slash by backslash within specified tag for file path


I have xml like file with tags like:

<id>SomeID</id>
<datasource>C:/projects/my_project/my_file.jpg</datasource>
<title>My title can include / and other characters</title>
<abstract></abstract>

I want to change all slashes to backslashes but only within tag datasource (within opening and closing tag).

What is the general regex syntax to do that? UPDATE: I finally got to the first working solution with python:

regex_01 = re.compile(".*<datasource>")
regex_02 = re.compile("</datasource>.*")
file_content = ""        
for line in source_file.readlines():
    if "<datasource>" in line:
        start = regex_01.search(line).group()
        end = regex_02.search(line).group()
        part_to_replace = line.replace(start,"").replace(end,"")
        replaced = part_to_replace.replace("/","\\")
        file_content = file_content + start + replaced.strip() + end + "\n"
    else:
        file_content = file_content + line    

Could you suggest something more elegant?


Solution

  • You can try this with the skip/fail syntax:

    (?:<datasource>[^/]*?|.*(?=<datasource>)|(?=</datasource>).*)(*SKIP)(*FAIL)|/
    

    See it working here: https://regex101.com/r/86gc4d/1.

    But this one is for PCRE. In python, (*FAIL) can also be (?!) but for (*SKIP) I am not sure.

    If I am not wrong it should be added in latest python regex engine: https://pypi.python.org/pypi/regex.

    You can find the documentation of (*SKIP)(*FAIL) syntax here: http://www.rexegg.com/backtracking-control-verbs.html#skipfail, where it also says it works in Python in the example of that paragraph:

    # Python
    # if you don't have the regex package, pip install regex
    
    import regex as mrab
    
    # print(regex.__version__) should output 2.4.76 or higher
    print(mrab.findall(r'{[^}]*}(*SKIP)(*FAIL)|\b\w+\b',
                       'good words {and bad} {ones}'))
    # ['good', 'words']
    

    Hope it helps!