Search code examples
phpweb-scrapingdomdocument

PHP Scrape nested pages


I am new to web scrapes, and need to learn quickly for work. I am having trouble scraping a clients web page because the content I need to aquire is nested uniquely to each record on the main page (300+ times), some fields on the child pages are not in tags, and a bit of a mess. What would be the best logic for getting the following info. (Also if anyone knows of any newer scrape tools that are free and worth looking into, that'd be awesome. I am able to get all of the records on the parent page. I just dont know how to hop thru each record to access it's child page information, and grab it before moving to the next row on the parent page.


Solution

  • foreach top level pages {
        html = fetch page
        data = process html
        while (there are more descendant pages) {
            html = fetch next page using data
            data = process html
        }
        save this data chain
    }
    

    But if you're struggling with the above logic, I think I'd have to recommend you skip the code and focus your time on learning one of the existing tools. You're almost certain to save time. Espescially if you'll be scraping often.