Search code examples
powershellweb-crawlerinvoke-webrequest

Invoke-WebRequest not returning tag


I'm trying to crawl a website and Invoke-WebRequest is not displaying the a tag. Please see the following code:

$url = "https://groceries.asda.com/search/jack%20daniels"
$url.ParsedHtml.all.tags("a") | forEach-Object -MemberName innertext

The above should return any a tag innertext that's located on their website, however it returns blank. For example it should return this:

Jack Daniel's Old No. 7 Tennessee Whiskey

Solution

  • When you navigate to https://groceries.asda.com/search/jack%20daniels in a browser it doesn't just load a single flat html page - that particular site responds with a bare-bones "skeleton" page that contains a bunch of javascript that the browser executes to makes dozens (hundreds?) of additional requests to load the actual page contents and show the products (if you disable javascript you'll just see the bare-bones page).

    By comparison, Invoke-WebRequest only makes a single page request that in your case just retrieves the "skeleton" page contents at the exact url you give it - it doesn't emulate the browser and run the javascript that loads the rest of the page. At that point the product tags don't even exist in the document, which is why it's not finding them.

    If you want to retrieve the product details you're either going to need to work out what the correct url is that returns the product results for a given search term, or you're going to have to emulate a browser to execute the javascript in the skeleton page and automatically make all of the additional requests (e.g. use Selenium) to build the complete page.

    Neither is a trivial task, unfortunately :-(

    Chrome network trace for https://groceries.asda.com/search/jack%20daniels

    Screenshot of a Chrome network trace for https://groceries.asda.com/search/jack%20daniels

    Fiddler trace for Invoke-WebRequest -Uri "https://groceries.asda.com/search/jack%20daniels"

    Screenshot of a Fiddler trace for Invoke-WebRequest -Uri "https://groceries.asda.com/search/jack%20daniels"