I am trying to write scrapy spider with playwright. I use the module scrapy-playwright.
I successfully wrote a spider and it works fine if run manually (scrapy crawl my_spider
). But when I tried to start the spider process via PM2, then it just hangs and does not work as it should.
The problem is not in my code, and not in how I made the PM2 config. Because without playwright the spiders run successfully.
The problem is in the logic of the playwright.
I found a line of code where the process hangs: https://github.com/microsoft/playwright-python/blob/main/playwright/async_api/_context_manager.py#L40
Also after that I tried to write my own spider with synchronous api of playwright. This spider successfully starts manually, but in PM2 it also freezes.
Synchronous playwright hangs on this line: https://github.com/microsoft/playwright-python/blob/main/playwright/sync_api/_context_manager.py#L88
I don't understand why this is happening and how to solve this problem. My playwright spiders successfully start manually, but in PM2 they freeze.
Could you please help me with this problem.
I had a similar problem: I had PM2 run by an Apache Airflow Celery worker running a playwright task, hourly. The job hung.
I suspect it is the environment variables littered by Byobu and PM2. What worked for me was wrapping the Python script in a bash script and sanitizing the environment variable like the following:
#!/bin/bash
[ "$HOME" != "" ] && exec -c $0
# Your python script should come below:
/home/ubuntu/usr/venv-3.10-airflow/bin/airflow celery worker
The first line clears the env vars (as suggested in Sanitize environment with command or bash script?). Now you can run this script with pm2 start
command. My worker ran fine for about 24 hours as of now. I hope this works for you too.