Search code examples
javascriptnode.jsregexhtml-parsing

How parse fetch text from html tags in nodejs?


I have a html as text in nodejs as follow:

var htmlText = `<div class="X7NTVe">
        <a class="tHmfQe" href="/link1">
            <div class="am3QBf">
                <div>
                    <span>
                        <div class="BNeawe deIvCb AP7Wnd">
                            <span dir="rtl">My First Text</span>
                        </div>
                    </span>
                </div>
            </div>
        </a>
        <div class="HBTM6d XS7yGd">
            <a href="/anotherLink1">
                <div class="BNeawe mAdjQc uEec3 AP7Wnd">&gt;</div>
            </a>
        </div>
    </div>
    <div class="x54gtf"></div>
    <div class="X7NTVe">
        <a class="tHmfQe" href="/link2">
            <div class="am3QBf">
                <div>
                    <span>
                        <div class="BNeawe deIvCb AP7Wnd">
                            <span dir="rtl">My Second Text</span>
                        </div>
                    </span>
                </div>
            </div>
        </a>
        <div class="HBTM6d XS7yGd">
            <a href="/anotherLink2">
                <div class="BNeawe mAdjQc uEec3 AP7Wnd">&gt;</div>
            </a>
        </div>
    </div>
    <div class="x54gtf"></div>`

Now I Want to fetch text form it as array. In abow example it must return My First Text and My Second Text . How can I do it?

Note: I want to do it in nodejs note in javascript.


Solution

  • With cheerio:

    let $ = cheerio.load(html)
    let strings = $('div[class="BNeawe deIvCb AP7Wnd"]>span[dir]')
                  .get().map(span => $(span).text())