Search code examples
phpregexxmlpreg-replacesanitization

Remove any tag attributes from all closing tags found in a poorly written XML string


I'm trying to use preg_replace() to sanitize poorly written XML.

$x = '<abc x="y"><def x="g">more test</def x="g"><blah>test data</blah></abc x="y">';

The logic is to check if there's a space within a closing tag </ > and delete everything from the space to the end of the tag.

Desired result:

<abc x="y"><def x="g">more test</def><blah>test data</blah></abc>

Solution

  • This should do it:

    preg_replace('/<\/(\w+)\s*[^>]*>/', '</\1>', $x);