Search code examples
javascriptjqueryweb-scrapingcheerio

Scraping HTML Comments using jQuery / CheerioJS?


I have this snippet of HTML code here: https://pastebin.com/wbQwys8R My goal is to parse the HTML comments so I can put them in a dictionary. This snippet of code here

    $("body").find("div.cl-entry").each((currIndex, currElement) => {
        /* Get the comments from each run */
    })

Allows me to find all the HTML that gives me that small snippet of HTML code (the pastebin link) above but how do I parse the HTML comments themselves?


Solution

  • Accessing the .children property of the div will allow you to iterate through children including comments with Cheerio. For example:

    const cheerio = require('cheerio');
    const $ = cheerio.load(`
    <div class='cl-info'>
      <!-- updated=Saturday, 13-Mar-2021 06:46:41 GMT -->
      <!-- id=985dad7f1491 -->
      <!-- status=active -->
      <!-- offline=no -->
      <!-- name=0-30 Kiwi-SDR Flornes, Norway LB8PI -->
      <!-- sdr_hw=KiwiSDR v1.438 ⁣ 📻 DRM ⁣ -->
    </div>`);
    for (const child of $('.cl-info')[0].children) {
        if (child.type === 'comment') {
            console.log(child.data);
        }
    }
    

    results in the following being logged:

     updated=Saturday, 13-Mar-2021 06:46:41 GMT 
     id=985dad7f1491
     status=active
     offline=no
     name=0-30 Kiwi-SDR Flornes, Norway LB8PI 
     sdr_hw=KiwiSDR v1.438 ⁣ � DRM ⁣