I am trying to webscrape walmart's products. Here is the link I am trying to pull https://www.walmart.com/search/?query=&cat_id=91083 I am able to successfully scrape like 10 products from the page. Here is the code I am using.
const axios = require('axios');
const cheerio = require('cheerio');
axios.get('https://www.walmart.com/search/?query=&cat_id=91083').then( res => {
var combino1 = [];
const $ = cheerio.load(res.data);
$('a.product-title-link').each( (index, element) => {
const name = $(element)
.first().text()
combino1[index] = {name}
})
console.log(combino1);
})
When I search the dom with a.product-title-link it shows 40 products. Why I am able to only grab 10 and not 40?
Your issue is that a call with axios
will only get you the HTML provided from the server
this means that any asynchronous calls that fetch products from other parts of their system, will never be in that request
a simple output of the data received to a new file, will show this fact
const fs = require('fs')
...
fs.writeFileSync('./data.html', res.data)
opening the new data.html
file will only output 10
as the number of product-title-link
found
For that you can't use axios
but a web scraper library, for example, Puppeteer as with it, you can wait for all products to be loaded prior to transverse the DOM at that given time.