I'm trying to scrape some informations from an instagram profile page with nightmarejs (a phantomjs derivate using electron as a browser).
The goal is to get the alt tags of all images on the profile (for examples sake I focus only on the images before the "show more" button)
var Nightmare = require('nightmare');
var nightmare = Nightmare({ show: true });
nightmare
.goto('https://www.instagram.com/ackerfestival/')
.evaluate(function () {
let array = [...document.querySelectorAll('._icyx7')];
return array.length;
})
.end()
.then(function (result) {
console.log(result);
})
.catch(function (error) {
console.error('Search failed:', error);
});
This example works, the array has a length of 12. The electron browser opens and closes, so everything is fine. But if I change the return to just the array, the electron browser never closes and I don't get a console.log.
What am I doing wrong? I want to get all informations from the images in an Array or Object.
The problem you're hitting is document.querySelectorAll()
returns a NodeList
of DOMElement
s. Those two object types do not serialize well, and the return value from .evaluate()
has to serialize across the IPC boundary - I'm betting you're getting an empty array on the other side of your .evaluate()
call?
The easiest answer here is to map out what, specifically, you want from the NodeList
. From the hip, something like the following should get the idea across:
.evaluate(function(){
return Array.from(document.querySelectorAll('._icyx7')).map(element => element.innerText);
})
.then((innerTexts) => {
// ... do something with the inner texts of each element
})