im trying to figure out how to web scrape the price of a product into google sheets. Before i would use =importrange(A1,B1)
And have B1 = /html/body/div[3]/main/div[2]/div/div[1]/div[3]/div[2]/div/div[1]/div[1]/div/div/div/div/div/div[2]/div[1]/div[1]/div/span[1]
This was done by coping the full Xpath. So far this has been working but unfortunately it seems like that is no longer a way to do it.
Im trying to do this for both Best Buy and Walmart
if i could have an example for these two links that would be awesome! Thank you in advanced!
In bestbuy, you can retrieve all informations in a json by
=textjoin(",",,importxml(url,"//script[@type='application/ld+json']"))
and especially
=regexextract(substitute(textjoin("",,importxml(A1,"//script[@type='application/ld+json']")),char(10),""),"{""@context"":""http://schema.org/"",""@type"":""Product"".*")
then you have to parse it (I don't know which informations you wish to retrieve). You can do it by this way
function bestbuy(source, code) {
const obj = JSON.parse(source);
const v = eval('obj.' + code)
if (typeof v == 'object') {
const header = Object.keys(v[0]);
return [header, ...v.map(o => header.map(h => Array.isArray(o[h]) ? o[h].join(",") : o[h]))];
} else {
return v
}
}
use the script this way =bestbuy(A2,code)
where in A2 you have the result of the importxml function and in code one listed hereafter, for instance =bestbuy(A2,"offers.highPrice")
/* available informations =
@context
@type
name
image
url
description
sku
gtin13
model
brand.@type
brand.name
aggregateRating.@type
aggregateRating.ratingValue
aggregateRating.reviewCount
offers.@type
offers.priceCurrency
offers.seller.@type
offers.seller.name
offers.lowPrice
offers.highPrice
offers.offers => array
*/
In walmart, the situation is quite different, you have also a json but you can't fetch through google since there is a captcha.