Search code examples
javascriptscreen-scrapingcheerio

I am using cheerio to grab stats information from https://www.nba.com/players/langston/galloway/204038 but I can't the table data to show up


the information i want to access

No matter what I do, I just can't access the table of stats. I am suspicious it has to do with there being multiple tables but I am not sure.

var cheerio = require("cheerio");

var axios = require("axios");

axios
  .get("https://www.nba.com/players/langston/galloway/204038")
  .then(function (response) {
    var $ = cheerio.load(response.data);

    console.log(
      $("player-detail").find("section.nba-player-stats-traditional").find("td:nth-child(3)").text()    
    );
  });

Solution

  • The actual html returned from your get request doesn't contain the data or a table. When your browser loads the page, a script is executed that pulls the data from using api calls and creates most of the elements on the page.

    If you open the chrome developer tools (CTRL+SHIFT+J) and switch to the network tab and reload the page you can see all of the requests taking place. The first one is the html that is downloaded in your axios GET request. If you click on that you can see the HTML is very basic compared to what you see when you inspect the page.

    If you click on 'XHR' that will show most of the API calls that are made to get data. There's an interesting one for '204038_profile.json'. If you click on that you can see the information I think you want in JSON format which is much easier to use without parsing an html table. You can right-click on '204038_profile.json' and copy the full url:

    https://data.nba.net/prod/v1/2019/players/204038_profile.json
    

    NOTE: Most websites will not like you using their data like this, you might want to check what their policy is. They could make it more difficult to access the data or change the urls at any time.

    You might want to check out this question or this one about how to load the page and run the javascript to simulate a browser.

    The second one is particularly interesting and has an answer saying how you can intercept and mutate requests from puppeteer