I've just upgraded my working scrapy app hosted on heroku to Build Pack 20. I'm now getting an error in my logs that happens before my scraping application finishes.
Logs:
2021-08-25T14:15:49.867725+00:00 app[api]: Starting process with command `scrapy crawl main` by user [email protected]
2021-08-25T14:15:57.812969+00:00 heroku[run.7197]: State changed from starting to up
2021-08-25T14:15:57.758336+00:00 heroku[run.7197]: Awaiting client
2021-08-25T14:15:57.776747+00:00 heroku[run.7197]: Starting process with command `scrapy crawl main`
2021-08-25T14:37:11.126653+00:00 heroku[run.7197]: Client connection closed. Sending SIGHUP to all processes
2021-08-25T14:37:11.650022+00:00 heroku[run.7197]: Process exited with status 129
2021-08-25T14:37:11.850624+00:00 heroku[run.7197]: State changed from up to complete
I believe my problem probably relates to a Dyno limit on Heroku related to attached one off dyno limits which issue a timout reset. I'm not sure if this resets the dyno or just the shell terminal. https://devcenter.heroku.com/articles/limits#dynos
Do I need to change something in my code to refresh the timeout counter using a "keep-alive" strategy?
Edit: From the Heroku shell, I did see that the spider was working perfectly for about an hour (a few hundred items scraped) and then the shell session ended without any notice or error message. So, I assume this was the "SIGHUP" interruption sent by the dyno?
I solved my problem. I'm passing this on in case anyone else runs into it as it was clearly a "rookie" mistake.
I was trying to run my app using the web console that allows a bash command from the browser. this is "run console" from the "more" dropdown in upper right corner.
Apparently the SIGHUP is a signal sent from the "run console shell" that times out after an hour. My app exited with exit code 129 rather than expected exit 0.
If I run the app from the CLI using:
heroku run [my start command]
It runs all the way to completion and I get full logs and stdout from the CLI.