Search code examples
typescriptweb-scrapingaxioscheerio

Cheerio does not return items from the given path


My main objective is, through cheerio, to make a scrapping of the titles of this imdb ranking

https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv

However, following cheerio's documentation and placing the exact html path of the listed titles, I am still returned random and confusing objects, like:

  'x-attribsNamespace': [Object: null prototype] {},
    'x-attribsPrefix': [Object: null prototype] {}
  },
  '80': <ref *81> Element {
    parent: Element {
      parent: [Element],
      prev: [Text],
      next: [Text],
      startIndex: null,
      endIndex: null,
      children: [Array],
      name: 'tbody',
      attribs: [Object: null prototype],
      type: 'tag',
      namespace: 'http://www.w3.org/1999/xhtml',
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype]
    },
    prev: Text {
      parent: [Element],
      prev: [Element],
      next: [Circular *81],
      startIndex: null,
      endIndex: null,
      data: '\n\n  ',
      type: 'text'
    },
    next: Text {
      parent: [Element],
      prev: [Circular *81],
      next: [Element],
      startIndex: null,
      endIndex: null,
      data: '\n\n  ',
      type: 'text'
    },
    startIndex: null,
    endIndex: null,
    children: [
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text]
    ],
    name: 'tr',
    attribs: [Object: null prototype] {},
    type: 'tag',
    namespace: 'http://www.w3.org/1999/xhtml',
    'x-attribsNamespace': [Object: null prototype] {},
    'x-attribsPrefix': [Object: null prototype] {}
  },
  '81': <ref *82> Element {
    parent: Element {
      parent: [Element],
      prev: [Text],
      next: [Text],
      startIndex: null,
      endIndex: null,
      children: [Array],
      name: 'tbody',
      attribs: [Object: null prototype],
      type: 'tag',
      namespace: 'http://www.w3.org/1999/xhtml',
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype]
    },
    prev: Text {
      parent: [Element],
      prev: [Element],
      next: [Circular *82],
      startIndex: null,
      endIndex: null,
      data: '\n\n  ',
      type: 'text'
    },
    next: Text {
      parent: [Element],
      prev: [Circular *82],
      next: [Element],
      startIndex: null,
      endIndex: null,
      data: '\n\n  ',
      type: 'text'
    },
    startIndex: null,
    endIndex: null,
    children: [
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text]
    ],
    name: 'tr',
    attribs: [Object: null prototype] {},
    type: 'tag',
    namespace: 'http://www.w3.org/1999/xhtml',
    'x-attribsNamespace': [Object: null prototype] {},
    'x-attribsPrefix': [Object: null prototype] {}
  },
  '82': <ref *83> Element {
    parent: Element {
      parent: [Element],
      prev: [Text],
      next: [Text],
      startIndex: null,
      endIndex: null,
      children: [Array],
      name: 'tbody',
      attribs: [Object: null prototype],
      type: 'tag',
      namespace: 'http://www.w3.org/1999/xhtml',
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype]
    },
    prev: Text {
      parent: [Element],
      prev: [Element],
      next: [Circular *83],
      startIndex: null,
      endIndex: null,
      data: '\n\n  ',
      type: 'text'
    },
    next: Text {
      parent: [Element],
      prev: [Circular *83],
      next: [Element],
      startIndex: null,
      endIndex: null,
      data: '\n\n  ',
      type: 'text'
    },
    startIndex: null,
    endIndex: null,
    children: [
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text], [Element],
      [Text]
    ],
    name: 'tr',
    attribs: [Object: null prototype] {},
    type: 'tag',
    namespace: 'http://www.w3.org/1999/xhtml',
    'x-attribsNamespace': [Object: null prototype] {},
    'x-attribsPrefix': [Object: null prototype] {}
  },

code:

import * as cheerio from 'cheerio';
import axios from 'axios';
import fs from 'fs';


axios("https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv").then(res => {
    const data = res.data;
    const $ = cheerio.load(data);

    var cheerioData = $('.lister-list>tr').each((i, e) => {
        const title = $(e).find('.titleColumn a').text();
        console.log(title);
    })
    console.log(cheerioData);
})
 
 

I really don't understand what is being done wrong as the path is completely correct. can anybody help me?


Solution

  • You can convert the array of Cheerio objects to an array of text using map followed by a spread, a .get() or a .toArray().

    For example, with spread and vanilla JS Array#map:

    import axios from "axios";
    import cheerio from "cheerio";
    
    const url = "<Your URL>";
    
    axios(url).then(res => {
      const $ = cheerio.load(res.data);
      const text = [...$(".lister-list > tr")].map(e =>
        $(e).find(".titleColumn a").text().trim()
      );
      console.log(text);
    })
    

    Also possible, using .get() or .toArray() after a Cheerio .map (which has the index as the first argument):

    const text = $(".lister-list > tr").map((i, e) =>
      $(e).find(".titleColumn a").text().trim()
    ).get();
    

    If you want to use .each, you can .push() each text string onto a vanilla array, but this isn't as clean as .map, which exists to abstract away this pattern:

    const text = [];
    $(".lister-list > tr").each((i, e) => {
      text.push($(e).find(".titleColumn a").text().trim());
    });