Search code examples
node.jscheerio

How to fetch only text from html links using cheerio


Hello I have a webpage which has an HTML like this

<div class="css-content">
   <div class="css-2aj">
      <img src="" >
      <div data-bn-type="text" id="/48" class="">Latest News</div>
   </div>
   <div class="css-6f9">
      <div class="css-content">
         <a data-bn-type="link" href="/en/blog/news/523hshhshhshhs3331adc0" class="css-1ej">US could be on cusp of new Covid surge</a>

         <a data-bn-type="link" href="/en/blog/news/423hshhshhshhs3331adc0" class="css-1ej">Stop sharing your vaccine cards on social media</>
            <a data-bn-type="link" href="/en/blog/news/2222hshhshhshhs3331adc0" class="css-1ej">Italians can be fined up to $60,000 for selling the world's 'most dangerous' cheese</a>

         <a data-bn-type="link" href="/en/blog/news/2223hshhshhshhs3331adc0" class="css-1ej">The Masked Singer' reveals the identity of The Phoenix<a/>

        
      </div>
   </div>
</div>

I want results this way

  • US could be on cusp of new Covid surge

  • Italians can be fined up to $60,000 for selling the world's 'most dangerous' cheese

  • The Masked Singer' reveals the identity of The Phoenix

This is what I have tried

    var list = [];
$('div[class="css-6f9"]').find('div  > a').each(function (index, element) {
    list.push($(element).attr('href'));
});


console.log(list);

the result is empty array

I am completely new here and no idea how to fetch the result which is in <a></a> tags Please help


Solution

  • try this

    don't require cheerio as $

    const html = `<div class="css-content">
    <div class="css-2aj">
       <img src="" >
       <div data-bn-type="text" id="/48" class="">Latest News</div>
    </div>
    <div class="css-6f9">
       <div class="css-content">
          <a data-bn-type="link" href="/en/blog/news/523hshhshhshhs3331adc0" class="css-1ej">US could be on cusp of new Covid surge</a>
    
          <a data-bn-type="link" href="/en/blog/news/423hshhshhshhs3331adc0" class="css-1ej">Stop sharing your vaccine cards on social media</>
             <a data-bn-type="link" href="/en/blog/news/2222hshhshhshhs3331adc0" class="css-1ej">Italians can be fined up to $60,000 for selling the world's 'most dangerous' cheese</a>
    
          <a data-bn-type="link" href="/en/blog/news/2223hshhshhshhs3331adc0" class="css-1ej">The Masked Singer' reveals the identity of The Phoenix<a/>
    
         
       </div>
    </div>
    </div>`;
    const cheerio = require('cheerio');
    const $ = cheerio.load(html);
    let list = [];
    $('.css-content > a').each(function () {
      list.push($(this).text().trim());
    });
    console.log(list.filter((item) => Boolean(item)));
    

    enter image description here