Search code examples
phpxmlmalformed

Malformed XML Cleanup - Data not in tags


So I have XML with perfectly nested tags, however I'm ending up with weird characters directly outside of tags, such as:

<root><a_tag>Some perfectly valid string</a_tag> this
<b_tag>more data</b_tag>  
<c_tag>some more data</c_tag> 0</root>

Is there a function in PHP that does this natively, or will I need a regex to accomplish it?

Only function I'm already running on is from this answer: https://stackoverflow.com/a/3466049

Edit: When opening the file with emacs, the end of each line has a set of characters such as: ^@ or ^@S

Also - these documents were generated with InDesign.

Thanks!


Solution

  • After a lot of wasted time, the issue was simply Adobe InDesign producing a lot of weird characters into the exported XML - the ultimate solution ended up being altering an InDesign setting called:

    Remap Break, Whitespace, and Special Characters
    

    Which solved the XML issues instantly.