Search code examples
javascriptnode.jsweb-scrapingcheerio

Using Cheerio to scrape for information


I'm using Nodejs with the package Cheerio to collect information on a website, I am struggling to collect the information after iterating through to store the information into an array.

The text I want to collect in this example would be "George Ezra - Wanted On Voyage (LP, Album + CD, Album)"

I am currently iterating through elements with the class "shortcut_navigable" and want to store each title in an array.

HTML elements in the webpage

So far I have not been able to collect the text with this code example.

axios(url)
    .then(response => {
        const html = response.data
        const $ = cheerio.load(html)
        const articles = []
        $('.shortcut_navigable', html).each(function() {
            const title = $(this).find('a').has('.item_description').text()
            articles.push({
                title,
            })
        })
        console.log(articles)
    }).catch(err => console.log(err))

I would like to be able to store all titles form the element with the class 'item_description_title' inside an array and later on be able to store other information.

Any advice and help is much appreciated, thank you.


Solution

  • If I'm guessing your URL correctly, has doesn't seem necessary. You can use the class name in .find() directly.

    Here's an example to get you started:

    const axios = require("axios"); // 1.4.0
    const cheerio = require("cheerio"); // 1.0.0-rc.12
    
    const url = "https://www.discogs.com/sell/release/5840245?ev=rb";
    
    axios
      .get(url)
      .then(({data: html}) => {
        const $ = cheerio.load(html);
        const data = [...$(".shortcut_navigable")].map(e => ({
          title: $(e).find(".item_description_title").text().trim(),
          seller: $(e).find(".seller_block a").text().trim(),
          price: +$(e).find("[data-pricevalue]").data("pricevalue"),
          shipping: $(e).find(".item_shipping").contents().get(0).data.trim(),
          // ...etc...
        }));
        console.log(data);
      })
      .catch((err) => console.error(err));
    

    As an aside, it's a good idea to post text, rather than screenshots. Axios and cheerio don't execute JS, so the dev tools shown in your screenshot can be misleading, since it shows elements created dynamically, after page load. view-source: provides a better representation of what you're likely to get with axios (or simply print the response body that axios delivers). Luckily, JS doesn't appear to be a factor on this part of the site you're working with at this time, but it's worth keeping in mind.