With the following html:
<div class="borderTopWhite item-sub-details">
<ul>
<li>Sport: <a href="#">Baseball</a></li>
<li>Team:
<a href="#">Philadelphia Phillies</a>
</li>
</ul>
</div>
My goal is to get the text of Team. In the above example I would like to return Philadelphia Phillies.
My (failed) attempt returns empty object:
const details = await page.$$eval(
'.item-sub-details ul li',
els => els.map(el => el)
)
Try selecting by the text Team:
to get the parent element, then query the <a>
within it:
const puppeteer = require("puppeteer"); // ^22.6.0
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setJavaScriptEnabled(false);
await page.setRequestInterception(true);
page.on("request", req => {
if (req.url() === url) {
req.continue();
} else {
req.abort();
}
});
await page.goto(url, {waitUntil: "domcontentloaded"});
const team = await page.$eval(
"li::-p-text('Team:')",
el => el.querySelector("a").textContent
);
console.log(team); // => Oakland Athletics
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Note that you don't need Puppeteer for this, since the data is in the static HTML. That's why I blocked all requests and disabled JS in the above code. Just use fetch and the lightweight HTML parser Cheerio:
const cheerio = require("cheerio"); // ^1.0.0-rc.12
const url = "<Your URL>";
fetch(url)
.then(res => {
if (!res.ok) {
throw Error(res.statusText);
}
return res.text();
})
.then(html => {
const $ = cheerio.load(html);
const team = $("li:contains(Team:)").find("a").text();
console.log(team); // => Oakland Athletics
})
.catch(err => console.error(err));
Fetch and Cheerio:
real 0m1.158s
user 0m0.315s
sys 0m0.043s
Optimized Puppeteer:
real 0m1.489s
user 0m0.650s
sys 0m0.174s
Unoptimized Puppeteer (with JS and request interception disabled):
real 0m2.260s
user 0m0.980s
sys 0m0.263s