I have an unformatted xml file in which I would like to delete tags of a specific name that contain some value.
Example:
<XmlElement1>
</XmlElement1>
<XmlElement2 ... >
...
<Xml1SubElement someParameter="...SearchTerm..."/>
...
</XmlElement2>
<XmlElement3/>
... stands for random characters and random multiple lines
In above example I would like to delete all XmlElement2 elements that contain "SearchTerm" in the body. In other words select all text between <XmlElement2
and </XmlElement2>
across multiple lines where SearchTerm
is in the middle and replace with "".
I'm using UltraEdit on MacOS and am flexible with what tools to use.
Your help is much appreciated!
The Perl regular expression search string for this task can be for example:
(?s)^[\t ]*<XmlElement2(?:.(?!</XmlElement2>))+?SearchTerm.+?</XmlElement2>[\t ]*(?:\r?\n|\r)
Explanation:
(?s)
... flag to match newline characters also by dot in search expression.
^[\t ]*
... start search at beginning of a line and match 0 or more tabs or spaces.
<XmlElement2
... the start tag of the element to remove on containing SearchTerm
.
(?:.(?!</XmlElement2>))+?
... a non marking group to find any character one or more times non-greedy as long as the string after the current character is not </XmlElement2>
. The negative lookahead (?!</XmlElement2>)
prevents selecting a block starting with <XmlElement2
and matching anything including one or even more </XmlElement2>
and <XmlElement2
tags until SearchTerm
is found anywhere in file.
SearchTerm
... string which must be found inside element XmlElement2
.
.+?
... any character (including newline characters) one or more times non-greedy. Non-greedy means here to stop matching characters on next occurrence of </XmlElement2>
and not on last occurrence of </XmlElement2>
in file.
</XmlElement2>
... the end tag of the XML element to remove on containing SearchTerm
.
[\t ]*(?:\r?\n|\r)
... 0 or more tabs or spaces and either DOS/Windows (carriage return + line-feed) or UNIX (just line-feed) or MAC (just carriage return) line ending.
PS: The Perl regular expression replace was tested with UltraEdit for Windows v22.20.0.49 on Windows XP and v25.20.0.88 on Windows 7 as I don't have a Mac.