I'm currently working on some scraping with cheerio
and nightmare
. The reason why I'm using both and not just cheerio
is because I have to manipulate the site to get to the part that I want to scrape and I found nightmare very good at doing those scripts.
So, right now I'm using nightmare
to get until the part that the info that I need is displayed. After that, on the evaluate()
I'm trying to somehow return the current html
to then pass it to cheerio
to do the scrape. The problem is that I don't know how to retrieve the html from the document
object. Is there is a property from the document
thats returns the full body?
Here is what I'm trying to do:
var Nightmare = require('nightmare');
var nightmare = Nightmare({show:true})
var express = require('express');
var fs = require('fs');
var request = require('request');
var cheerio = require('cheerio');
var app = express();
var urlWeb = "url";
var selectCity = "#ddl_city"
nightmare
.goto(urlWeb)
.wait(selectCity)
.select('#ddl_city', '19')
.wait(6000)
.select('#ddl_theater', '12')
.wait(1000)
.click('#btn_enter')
.wait('#aspnetForm')
.evaluate(function(){
//here is where I want to return the html body
return document.html;
})
.then(function(body){
//loading html body to cheerio
var $ = cheerio.load(body);
console.log(body);
})
With this worked:
document.body.innerHTML