scraping a page

What would be best practice in scraping a horrible mess of a distributor's inventory page (using js to document.write a <td>, then using plaintext html to close it)? No divs/tds/anything is labelled with any id or classes, etc.

Should I just straight up preg_match(?_all) the thing or is there some xpath magic I can do? There is no api, no feeds, no xml, nothing clean at all.

edit:

- What i'm basically thinking of atm is something like http://pastebin.com/raw.php?i=EuMfRVD5 - is that my best bet or is there any other way?

Solution

Your example is not enough of an example. But since you seemingly don't need the highlighting meta info anyway, the JS-obfuscation could be undone with a bit of:

$html = preg_replace('# <script .*? (?: document.write\("(.*?)"\) )? .*? </script> #six', "$1", $html);

Maybe that's already good enough to pipe it through one of the DOM libraries afterwards.

Build a string by repeating a character N times
Validate an array as "empty" if it has no elements or if all elements have no length
phpunit runs test twice - gets two answers. Why?
How can I make Laravel return a custom error for a JSON REST API
regex for checking URL's wildcard for both subdomain and top level domain?
Use Simple HTML Dom to parse HTML and generate an array of href values and plain text
Generate an array of href strings from all <a> tags in an HTML document using Simple HTML DOM Parser
Making POST application/json request with file_get_contents
Woocommerce add to cart button redirect to checkout
PHP form send email to multiple recipients
How to loop over json-encoded data?
Fill array with numbers adding up to a nominated total without exceeding item maximum
Iterate over all last-day-of-the-month dates in a year
Generate an associative 2d array with incremented column values using a flat array as first level keys
Debug a single PHP file with $_GET parameters in NetBeans/Eclipse/PHPstorm (AJAX API)
Put Values in Multidimensional Array keys
Laravel queued jobs processed immediately even with a delay
Error when calling function it self, VCS say="undefined function"
How to truncate number 3 decimals
PHP - How to get OAuth 2.0 Token
Consume API endpoint from Laravel + Inertia App
Generate real US street addresses for Selenium RC script
Hour difference between 2 DateTime objects in Decimal Format
Php update database if checkbox is not checked
Group data from lines of a log file by one column and create subarrays from another column
Use values of a flat array as the keys in every row of a 2d array
Redis keys are not expiring - Laravel, Predis
WooCommerce order status hook not triggering
Execute a function on each element of an array
Group a 2d array by year-month of a date column, add column of counts and create a subarray from a column of ids in each group