Search code examples
javascripthtmlregexperlmarkup

Need regular expression to remove /> between two HTML markup tags except img tag


I need some help crafting a regular expression which removes /> between two HTML markup tags.

<!-- The line could look like this -->
<td align=right valign=bottom nowrap><div>January 24, 2013 /></div></td>

<!-- Or this -->
<div>Is this system supported? /></div>

<!-- Even this -->
<span>This is a span tag /></div>

<!-- It could look like any of these but I do not want /> removed -->
<img src="example.com/example.jpg"/></img>
<img src="example.com/example.jpg"/>
<img src="example.com/example.jpg"/></img>
<div id="example"><img src="example.com/example.jpg"/></div>

(Yes, I realize the img tag has no closing tag associated with it. I am dynamically editing a myriad of pages I have not created; it's not my markup.)

Here's the regex I came up with (using perl):

s|(<.*?>(?!<img).*?)(\s*/>)(?!</img>)(</.*?>)|$1$3|gi;

Is there a better regex that's more efficient or faster?

After regex is applied to the above examples, here are the results:

<!-- The line could look like this -->
<td align=right valign=bottom nowrap><div>January 24, 2013></div></td>

<!-- Or this -->
<div>Is this system supported?></div>

<!-- Even this -->
<span>This is a span tag></div>

<!-- It could look like any of these but I do not want /> removed -->
<img src="example.com/example.jpg"/></img>
<img src="example.com/example.jpg"/>
<img src="example.com/example.jpg"/></img>
<div id="example"><img src="example.com/example.jpg"/></div>

Solution

  • A shorter solution would be:

    s/(<[^>]*>[^<]*)\/>/$1/g
    

    It groups an opening tag and the possibly following content, excluding the opening angular bracket - which would indicate another tag. Then it looks for />. If it is found, substition is used to remove it.

    Update: The question was extended to remove possible whitespace before the />. This can be done by making the [^<]* part "lazy" like so:

    s/(<[^>]*>[^<]*?)\s*\/>/$1/g
    

    See for yourself on regex101 (link updated).