I'm writing a web scraping application in Typescript with Puppeteer. I'm "attaching" a Javascript file with utility functions to the page instance, to make the scraping easier (This is done with Pupeteer's page.addScriptTag
function, see the API here). Here's what one of the utility functions on the page might look like:
// functions.ts
export const getLink = (node: Element) => {
let link = node.querySelector("a");
return link ? link.href : null;
};
Then you can use the functions inside page.evaluate
:
// process.ts
import { getLink } from "../functions";
interface LinkArgs {
page: puppeteer.Page;
selector: selector;
}
export const getLinkFromPage = async ({ page, selector }): LinkArgs) =>
page.evaluate((selector) => {
const link = getLink(selector); // I'm using the function here.
return link;
}, selectors);
The problem is that when I'm doing this, the imports are failing during development. I believe this is because the import
and export
compiled syntax is not working inside of chrome. Here's the error from my browser:
Could not get links. Error: Evaluation failed: ReferenceError: src_1 is not defined
at __puppeteer_evaluation_script__:2:20
at ExecutionContext._evaluateInternal (/Users/harrisoncramer/Desktop/Code/projects/gql3.0_schedulers/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:217
:19)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async ExecutionContext.evaluate (/Users/harrisoncramer/Desktop/Code/projects/gql3.0_schedulers/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:106:16
)
Evaluation failed: ReferenceError: src_1 is not defined
at __puppeteer_evaluation_script__:2:20
I've got a hacky workaround: I'm punching the functions.ts
file into a compiler, and then removing all of the export
keywords from the functions.js
file. Then, I'm removing all of the import
statements from inside the process.ts
file, like this:
// functions.js
const getLink = (node) => {
let link = node.querySelector("a");
return link ? link.href : null;
};
// process.js
// Turning off this import...
// import { getLink } from "../functions";
interface LinkArgs {
page: puppeteer.Page;
selector: selector;
}
export const getLinkFromPage = async ({ page, selector }): LinkArgs) =>
page.evaluate((selector) => {
const link = getLink(selector); // I'm using the function here.
return link;
}, selectors);
This, however, breaks the type checking during development! What's the better way of solving this problem?! How can one import compiled Javascript functions onto the page without breaking the Typescript type-checking?
Anything inside of page.evaluate
is essentially run inside of Chrome's DevTools console, or in the same context you'd be in if you were to do so. So imports won't work in this context, at least not how you're attempting it. You have to explicitly pass the function into the context like this:
const getLink = (node) => {
let link = node.querySelector("a");
return link ? link.href : null;
};
// process.js
// Turning off this import...
// import { getLink } from "../functions";
interface LinkArgs {
page: puppeteer.Page;
selector: selector;
}
export const getLinkFromPage = async ({ page, selector }): LinkArgs) =>
page.evaluate((selector, getLink) => {
const link = getLink(selector); // I'm using the function here.
return link;
}, selectors, getLink);