I am playing around scraping website technique, For ex link, Its always returning empty for description. The reason is its populated by JS with the following code, How do we go about with these kinds of senarios.
// Frontend JS
P.when('DynamicIframe').execute(function(DynamicIframe){
var BookDescriptionIframe = null,
bookDescEncodedData = "book desc data",
bookDescriptionAvailableHeight,
minBookDescriptionInitialHeight = 112,
options = {},
iframeId = "bookDesc_iframe";
I am using php domxpath as below
$file = 'sample.html';
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
// I am saving the returned html to a file and reading the file.
@$dom->loadHTMLFile($file);
$xpath = new DOMXPath($dom);
// This xpath works on chrome console, but not here
// because the content is dynamically created via js
$desc = $xpath->query('//*[@id="bookDesc_iframe"]')
Everytime when you see these kinds of JavaScript Generated content and especially from big guys like amazon, google, you should immediately think that it would have a graceful degradation implementation.
Meaning it would be done for where Javascript doesn't work like links browser for better browser coverage.
Lookout for <noscript>
you may find one. and with that you can solve the problem.