Search code examples
regexreplaceeditpad

Regular Expressions - Select the Second Match


I have a txt file with <i> and </i> between words that I would like to remove using Editpad

For example, I'd like to keep when it's like this:

<i>Phrases and words.</i>

And I'd like to remove the </i> and <i> tags inside the phrase, when it's like this:

<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>

I was trying to do that using regex, but I couldn't do it.

As the tag is followed by space or a word character I could find when the line has the double tag with

/ <i>|<\/i> /

but this way I can't just press replace for nothing, I have to edit line by line I search.

There's anyway to accomplish that?

* Edited *

Another example of lines found on the subtitle text

<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>

Solution

  • Rule number one: you can't parse html with regex.

    That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)

    If I've understood correctly, it looks like you can simply remove all <i> and </i> that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:

    (?<=.)\<\/?i\>(?=.)
    

    This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)

    Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.

    update to work with Editpad

    EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:

    Try searching for the regex: (.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the <i> tags.

    When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of \1\2 should work.