I'm trying to alter some XML with Find&Replace in Notepad++ using regex.
This is the specific XML I'm trying to capture:
<category name="Content Server Categories:FOLDER:test category">
<attribute name="test attribuut"><![CDATA[test]]></attribute>
<attribute name="test attribuut1"><![CDATA[test1]]></attribute>
</category>
Following 'FIND' regex does the job (for now):
<(category) name="Content Server Categories:(.+?)">(.+)</(category)>
Now i need the XML to be replaced by this:
<category-FOLDER:testcategory name="Content Server Categories:FOLDER:test category">
<attribute name="test attribuut"><![CDATA[test]]></attribute>
<attribute name="test attribuut1"><![CDATA[test1]]></attribute>
</category-FOLDER:testcategory>
Currently i tried using this 'REPLACE BY' regex:
<($1-$2) name="Content Server Categories:($2)">($3)</($1-$2)>
But that gives to following output:
<category-FOLDER:test category name="Content Server Categories:FOLDER:test category">
<attribute name="test attribuut"><![CDATA[test]]></attribute>
<attribute name="test attribuut1"><![CDATA[test1]]></attribute>
</category-FOLDER:test category>
As you can see i get category-FOLDER:test category instead of category-FOLDER:testcategory
The space(s) needs to be removed..
The problem is that the input can look different. Now it is this:
<category name="Content Server Categories:FOLDER:test category">
But it could look like these examples as well:
<category name="Content Server Categories:FOLDER1:FOLDER2:test category">
<category name="Content Server Categories:FOLDER NAME:test category">
<category name="Content Server Categories:FOLDER NAME: FOLDER NAME1:test category">
<category name="Content Server Categories:FOLDER:test category name">
...
How do I catch all of these correctly and remove the spaces?
EDIT: Almost forgot,
'. Matches newline' is __ON__
One approach could be to do it in 2 steps due to the replacement of the multiple spaces afterwards.
Get the required structure (Note to use the non greedy version .*?
to prevent over matching):
<(category) name="Content Server Categories:(.+?)">(.+?)</(category)>
In the replacement use your replacement without the parenthesis or else they would be included in the replacement:
<$1-$2 name="Content Server Categories:$2">$3</$1-$2>
Then match the spaces making use of repetitive matches using \G
:
(?:</?category-|\G(?!^))\K\s*([\w:]+) (?!name=)
In the replacement replace the whitespaces with capturing group 1 $1
Explanation
(?:
Non capturing group
</?category-FOLDER
Match text with an optional /
|
Or\G(?!^)
Assert position at the end of the previous match)
Close non capturing group\K\s*
Forget what was previously matched and then match 0+ whitespace chars([\w:]+)
Capture in group 1 matching 1+ times a word char or :(?!name=)
Assert what is on the right is not a not 'name='