Search code examples
phpregexscreen-scraping

Alternative to phpQuery that doesn't crash when dealing with invalid markup?


I am using phpQuery to parse pages however I noticed that when using it with pages with invalid markup it results to

PHP Fatal error: Uncaught exception 'Exception' with message 'Error loading XML markup'

An example offending code in a page is:

<?xml version="1.0" encoding="iso-8859-2"?>
<link href="http://example.com/?foo=bar&baz=quz" />

I wish phpQuery would return false for such pages but sadly it results in instant fatal error preventing me to do something about it.

The error happens immediately upon initializing phpquery like phpQuery::newDocumentFile($page);

I really like phpQuery since it works like jQuery but I'm looking for an alternative solution that works on invalid markups.


Solution

  • Actually, the crash isn't caused because of the invalid markup. The crash is caused becase you don't catch the exception that the invalid markup throws.

    Try initializing in a try-catch like:

    try {
        phpQuery::newDocumentFile($page);
        //Process the page here
    } catch (Exception $e) {
        //What do we do if any of that fails?
    }
    

    Exceptions only become fatal errors if you don't catch them.