I'm tasked to write a XML linter in PHP8 and it shall server as a web API. This XML linter must work in verbose mode that goes through the whole document and log every error found (up to 1000 errors) with line number (yes I know XML can one single-line but it's a mandatory requirement).
In other words, I need a XML reader/parser module that can:
But after some study, none of the PHP built-in XML extensions can satisfy these requirements.
For example here is a "bad" XML that the closing tags at line 5 (<AuthorityCode>...</Authority>
) & line 11 (<LastUpdateTime>...</LastUpdate>
) mismatches with starting tags:
<?xml version="1.0"?>
<FacilityList>
<UpdateTime>2022-09-09T08:00:00+08:00</UpdateTime>
<UpdateInterval type="SEMIAUTO">-1</UpdateInterval>
<AuthorityCode>CA</Authority>
<Facility>
<FacilityID>NFB-NR-P00501-013037-SN-S9K6VPJ36-0002</FacilityID>
<FacilityClass>01</FacilityClass>
<FacilityType>003</FacilityType>
<LocationType>1</LocationType>
<LastUpdateTime>2022-10-04T13:00:00+08:00</LastUpdate>
</Facility>
</FacilityList>
The xmllint
tool from libxml
will show all errors at line 5 and line 11, but both XMLReader and XML Parser will just stop at line 5 and won't go further, and I can't find a way to bypass it. Yes I've already set the XML_PARSE_RECOVER
flag in XMLReader
:
libxml_use_internal_errors(true);
$parser = new XMLReader();
$parser->open($filename,null,LIBXML_NOERROR|LIBXML_NOWARNING|1);
And it doesn't work (PHP 8.2.6).
Did I do something wrong, or it's just not possible to do what I wanted using built-in XMLReader / XML expat parser ? The DOMDocument can process and report both errors, but I don't want to load the whole 1GB data into memory.
[EDIT]
No I'm not asking for a 3rd party products but just want to know what should I do with PHP built-in functions. Like some sort of magic options in XMLReader
/ XML expat parser, or example codes to make DOMDocument
parsing based on partial data from a streaming source. Or at least just tell me that "you can't do this in PHP".
I've already checked many 3rd party libraries but none of them can do what I wanted. They either just provide a wrapper of XML expat parser, or relies on DOMDocument
to load everything into memory in the beginning.
=====
BTW, is there any reliable way to get line number from XMLReader ? Yes I know the XMLReader::expand()
trick but it just doesn't work when the XML is badly formatted (such as mission closing tag).
Trying to count the number of \n
and \r
by myself doesn't work either, because XMLReader
doesn't report anything before <FacilityList>
: the <?xml version="1.0"?>
and the following whitespace are totally ignored.
OK from the comments from other people, the answer for my question seems to be "NO YOU CAN'T DO THAT IN PHP".