Search code examples
javascriptnode.jsweb-scrapingcheerio

Scraping website with nodejs and getting an empty result


I have no idea why I don't get any result when scraping basketball-reference page. I want to get the Pace from https://www.basketball-reference.com/boxscores/202304200GSW.html. There are 2 Pace tags but both are the same, so first tag is fine.

I am also getting team names which are fine but getting pace is an empty number 0

const cheerio = require("cheerio");
const axios = require("axios");
const match_data = [];
async function sr_tables() {
    try {
            const boxscore_url = "https://www.basketball-reference.com/boxscores/202304200GSW.html"
            const response = await axios.get(boxscore_url);
            const $ = cheerio.load(response.data);

            // Home and Away tags
            let awayTeamTag = null;
            let homeTeamTag = null;
            let awayTeamName = null;
            let homeTeamName = null;
            $('strong a').each((i, el) => {
            if (i === 0) {
                awayTeamTag = $(el).attr('href').slice(7, 10);
                awayTeamName = $(el).text()
            } else if (i === 1) {
                homeTeamTag = $(el).attr('href').slice(7, 10);
                homeTeamName = $(el).text()
                return false; // break the loop after the second iteration
            }
            });
            const pace = Number($('table#four_factors tbody tr:first-child td[data-stat="pace"]').text());

            match_data.push({homeTeamName, awayTeamName, pace});
        
        console.log(match_data);
    }
    
    catch(error) {
        console.error(error);
    }
}

The result I get is

[
  {
    homeTeamName: 'Golden State Warriors',
    awayTeamName: 'Sacramento Kings',
    pace: 0
  }
]

Solution

  • Although the data is added after the page loads by JS, it's available as part of the static HTML you've already requested, so you can get it without browser automation.

    The data is in two tags within a comment that look like

    <td class="right " data-stat="pace" >100.9</td>
    

    Here's one way to get this, using this code to traverse comments:

    const axios = require("axios");
    const cheerio = require("cheerio"); // ^1.0.0-rc.12
    
    axios("<Your URL>")
      .then(({data: html}) => {
        const $ = cheerio.load(html);
        const {data: comment} = [...$("*").contents()]
          .find(e => e.type === "comment" && e.data.includes('data-stat="pace"'));
        const pace = [...cheerio.load(comment)('td[data-stat="pace"]')]
          .map(e => $(e).text());
        console.log(pace); // => [ '100.9', '100.9' ]
      })
      .catch(err => console.error(err));
    

    If you're sure these tags are the same and you're happy with just the first, you can use:

    const pace = cheerio.load(comment)('td[data-stat="pace"]').first().text();