I have xml like file with tags like:
<id>SomeID</id>
<datasource>C:/projects/my_project/my_file.jpg</datasource>
<title>My title can include / and other characters</title>
<abstract></abstract>
I want to change all slashes to backslashes but only within tag datasource (within opening and closing tag).
What is the general regex syntax to do that? UPDATE: I finally got to the first working solution with python:
regex_01 = re.compile(".*<datasource>")
regex_02 = re.compile("</datasource>.*")
file_content = ""
for line in source_file.readlines():
if "<datasource>" in line:
start = regex_01.search(line).group()
end = regex_02.search(line).group()
part_to_replace = line.replace(start,"").replace(end,"")
replaced = part_to_replace.replace("/","\\")
file_content = file_content + start + replaced.strip() + end + "\n"
else:
file_content = file_content + line
Could you suggest something more elegant?
You can try this with the skip/fail syntax:
(?:<datasource>[^/]*?|.*(?=<datasource>)|(?=</datasource>).*)(*SKIP)(*FAIL)|/
See it working here: https://regex101.com/r/86gc4d/1.
But this one is for PCRE. In python, (*FAIL)
can also be (?!)
but for (*SKIP)
I am not sure.
If I am not wrong it should be added in latest python regex engine: https://pypi.python.org/pypi/regex.
You can find the documentation of (*SKIP)(*FAIL)
syntax here: http://www.rexegg.com/backtracking-control-verbs.html#skipfail, where it also says it works in Python in the example of that paragraph:
# Python
# if you don't have the regex package, pip install regex
import regex as mrab
# print(regex.__version__) should output 2.4.76 or higher
print(mrab.findall(r'{[^}]*}(*SKIP)(*FAIL)|\b\w+\b',
'good words {and bad} {ones}'))
# ['good', 'words']
Hope it helps!