Search code examples
regexnotepad++archivewinrar

Remove lines with identical extension inside every <title> tag - Regular Expression


I create a Report List with Winrar.
Inside this list i have a text list like this

<tag>Adventures of Shuggy</tag>
!Shuggy.png
!Sound Bank.txt
4.lwav
5.lwav
6.lwav
88.lwav
89.lwav
<tag>Adventures of Jack</tag>
90.lwav
91.lwav
92.lwav
93.lwav
!Sound Bank.xsb

I want remove duplicates extensions inside every tag and have a text like this:

<tag>Adventures of Shuggy</tag>
!Shuggy.png
!Sound Bank.txt
4.lwav
<tag>Adventures of Jack</tag>
90.lwav
!Sound Bank.xsb

or even better

<tag>Adventures of Shuggy</tag>
.png
.txt
.lwav
 <tag>Adventures of Jack</tag>
.xsb

Is there a regular expression to use on notepad++ to remove lines with identical .txt or identical .lwav inside every tag like <title>?
Can i use excel for this?


Solution

  • Tossed this up real quick. Should work fine in notepad++. Make sure to set the global flag (if there is one) and the multiline one.

    /^.+(\.[^.]+)$(?=\s*(?:(?!<tag>)[^.])+\1)|^(?!<tag>)[^.]+/gm
    

    You should (most likely) not use the flags and delimiters like I have above. I'm sure you'll figure it out.

    Explanation + demo : http://regex101.com/r/lC0lD1