I am using Apify to get data from json file links. Here is the json data:
<html>
<body>
<pre>
{"exhibitor-single":[{"firstname":"Ines","lastname":"Konwiarz","email":"[email protected]"}]}
</pre>
</body>
</html>
So, I used following code in apify webscraper task.
async function pageFunction(context) {
const request = context.request;
const $ = context.jQuery;
var data = $('body > pre').html();
var items = JSON.parse(data);
return {
Url: request.url,
Last_Name: items[`exhibitor-single`].lastname,
First_Name: items[`exhibitor-single`].firstname,
Email: items[`exhibitor-single`].email
};
}
The variable data
have the correct css selector for the json data. But, its not returning any data. Can anyone help me find what went wrong here? Thanks in advance.
From the pageFunction structure, I guess that you are using apify/web-scraper.
If you want to get just data from JSON links you can easily use apify/cheerio-scraper. It will cost much less compute power as you don't need open whole browser.
You need to use set up pageFunction in cheerio scraper to get JSON data: pageFunction:
async function pageFunction(context) {
const { request ,json } = context;
const items = json;
return {
Url: request.url,
Last_Name: items.lastname,
First_Name: items[`exhibitor-single`].firstname,
Email: items.email
};
}
Cheerio scraper supports only HTML response by default, you need to update Additional mime types in Advanced configurations with value:
[
"application/json"
]