Search code examples
javascriptnode.jsweb-scrapingnightmare

nightmarejs scrape multiple Elements with querySelectorAll


I'm trying to scrape some informations from an instagram profile page with nightmarejs (a phantomjs derivate using electron as a browser).

The goal is to get the alt tags of all images on the profile (for examples sake I focus only on the images before the "show more" button)

var Nightmare = require('nightmare');
var nightmare = Nightmare({ show: true });

nightmare
  .goto('https://www.instagram.com/ackerfestival/')
  .evaluate(function () {
    let array = [...document.querySelectorAll('._icyx7')];
    return array.length;
  })
  .end()
  .then(function (result) {
    console.log(result);
  })
  .catch(function (error) {
    console.error('Search failed:', error);
  });
  

This example works, the array has a length of 12. The electron browser opens and closes, so everything is fine. But if I change the return to just the array, the electron browser never closes and I don't get a console.log.

What am I doing wrong? I want to get all informations from the images in an Array or Object.


Solution

  • The problem you're hitting is document.querySelectorAll() returns a NodeList of DOMElements. Those two object types do not serialize well, and the return value from .evaluate() has to serialize across the IPC boundary - I'm betting you're getting an empty array on the other side of your .evaluate() call?

    The easiest answer here is to map out what, specifically, you want from the NodeList. From the hip, something like the following should get the idea across:

    .evaluate(function(){
      return Array.from(document.querySelectorAll('._icyx7')).map(element => element.innerText);
    })
    .then((innerTexts) => {
      // ... do something with the inner texts of each element
    })