Search code examples
puppeteer

Attempting to get text inside an li where the type is


With the following html:

   <div class="borderTopWhite item-sub-details">
       <ul>
          <li>Sport: <a href="#">Baseball</a></li>
          <li>Team: 
             <a href="#">Philadelphia Phillies</a> 
          </li>
       </ul>
    </div>

My goal is to get the text of Team. In the above example I would like to return Philadelphia Phillies.

My (failed) attempt returns empty object:

        const details = await  page.$$eval(
          '.item-sub-details ul li',
          els => els.map(el => el)
          )

Solution

  • Try selecting by the text Team: to get the parent element, then query the <a> within it:

    const puppeteer = require("puppeteer"); // ^22.6.0
    
    const url = "<Your URL>";
    
    let browser;
    (async () => {
      browser = await puppeteer.launch();
      const [page] = await browser.pages();
      await page.setJavaScriptEnabled(false);
      await page.setRequestInterception(true);
      page.on("request", req => {
        if (req.url() === url) {
          req.continue();
        } else {
          req.abort();
        }
      });
      await page.goto(url, {waitUntil: "domcontentloaded"});
      const team = await page.$eval(
        "li::-p-text('Team:')",
        el => el.querySelector("a").textContent
      );
      console.log(team); // => Oakland Athletics
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close());
    

    Note that you don't need Puppeteer for this, since the data is in the static HTML. That's why I blocked all requests and disabled JS in the above code. Just use fetch and the lightweight HTML parser Cheerio:

    const cheerio = require("cheerio"); // ^1.0.0-rc.12
    
    const url = "<Your URL>";
    
    fetch(url)
      .then(res => {
        if (!res.ok) {
          throw Error(res.statusText);
        }
    
        return res.text();
      })
      .then(html => {
        const $ = cheerio.load(html);
        const team = $("li:contains(Team:)").find("a").text();
        console.log(team); // => Oakland Athletics
      })
      .catch(err => console.error(err));
    

    Fetch and Cheerio:

    real 0m1.158s
    user 0m0.315s
    sys  0m0.043s
    

    Optimized Puppeteer:

    real 0m1.489s
    user 0m0.650s
    sys  0m0.174s
    

    Unoptimized Puppeteer (with JS and request interception disabled):

    real 0m2.260s
    user 0m0.980s
    sys  0m0.263s