Search code examples
javascriptnode.jsweb-scrapingcheerio

Cheerio: find tag with multiple specific criteria easily and elegantly?


I'm trying to web scrape https://liquipedia.net/dota2/Admiral this page for all the <li> tags that are inside an <ul> tag that again is within a div with class mw-parser-output that has the title property. (I think that is what they're called in the HTML world? Like <tag property="...">).

What would be the most elegant, simple way to do this with Cheerio? I know I could do this with some for loops and stuff, but if there was a simple way to do this, my code would be a lot cleaner.


Solution

  • I'm sure glad Cheerio is like jQuery. A simple selector like this should do:

    const li = $('div.mw-parser-output > ul > li[title]').toArray(); // Optionaly turn selected items into an array
    

    Explanation of the CSS selector:

    1. div.mw-parser-output div makes sure the element is that. The dot signifies that the selector is a class.
    2. > Points to the immediate child
    3. ul Simple ul tag
    4. li[title] Any li tag, but it needs to have the title attribute.

    Then we turn the result into an array so it become usable.
    It's a simple as that.

    You could also get an array of the text of each li element with the following:

    const arrayOfLiTexts = li.map($el => $el.text());