Deployment of react-snap
on a CRA app has been mostly painless, giving huge page load speed boosts and requiring zero specialized configuration.
However, I'm seeing occasional issues with deploys (both locally and from netlify) only crawling a single page and then appearing done. Like this:
Normal result (maybe 50% of the time) means crawling ~50 pages and then everything else successfully finishes.
I've tried limiting concurrency to 1 without improvement. What other tools can I use to figure this problem out or configuration options can I include to fix this?
Figured this out: Webpack was setting PUBLIC_URL
to the production domain, and new deploys were looking on that domain for a JS file that looked like main.1234abcd.js
, using a hash of the js file for cache busting. This didn't exist on the production domain before it was deployed so loading the page failed and no links were detected.
Setting the JS links to root-relative URL (i.e. /static/js/main.1234abcd.js
) loaded the JS correctly from the snap-created server and allowed it to be crawled correctly.
In addition, it was helpful to debug via the anchor crawling section in react-snap here: https://github.com/stereobooster/react-snap/blob/master/src/puppeteer_utils.js#L108-L119