I want to get the whole html not just text.
Apify.main(async () => {
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({
url: //adress,
uniqueKey: makeid(100)
});
const handlePageFunction = async ({ request, $ }) => {
var content_to = $('.class')
};
// Set up the crawler, passing a single options object as an argument.
const crawler = new Apify.CheerioCrawler({
requestQueue,
handlePageFunction,
});
await crawler.run();
});
When I try this the crawler returns complex object. I know I can extract the text from the content_to variable using .text() but I need the whole html with tags like . What should I do?
If I understand you correctly - you could just use .html()
instead of .text()
. This way you will get inner html instead of inner text of the element.
Another thing to mention - you could also put body
to handlePageFunction
arg object:
const handlePageFunction = async ({ request, body, $ }) => {
body
would have the whole raw html of the page.