Search code examples
ubuntuscrapyscrapyd

Preferred way to run Scrapyd in the background / as a service


I am trying to run Scrapyd on a virtual Ubuntu 16.04 server, to which I connect via SSH. When I run scrapy by simply running

$ scrapyd

I can connect to the web interface by going to http://82.165.102.18:6800.

However, once I close the SSH connection, the web interface is no longer available, therefore, I think I need to run Scrapyd in the background as a service somehow.

After some research I came across a few proposed solutions:

  • daemon (sudo apt install daemon)
  • screen (sudo apt install screen)
  • tmux (sudo apt install tmux)

Does someone know what the best / recommended solution is? Unfortunately, the Scrapyd documentation is rather thin and outdated.

For some background, I need to run about 10-15 spiders on a daily basis.


Solution

  • Set ScrapyD as a System Service

    sudo nano /lib/systemd/system/scrapyd.service
    

    Then copy-paste following

    [Unit]
    Description=Scrapyd service
    After=network.target
    
    [Service]
    User=<YOUR-USER>
    Group=<USER-GROUP>
    WorkingDirectory=/any/directory/here
    ExecStart=/usr/local/bin/scrapyd
    
    [Install]
    WantedBy=multi-user.target
    

    Then enable service

    systemctl enable scrapyd.service
    

    Then start service

    systemctl start scrapyd.service
    

    Another method but not recommended

    Use this command.

    cd /path/to/your/project/folder && nohup scrapyd >& /dev/null &
    

    Now you can close your SSH connection but scrapyd will keep running.

    And to make sure that whenever your server restarts and scrapyd runs automatically. Do this

    copy the output of echo $PATH from your terminal, and then open your crontab by crontab -e

    Now at the very top of that file, write this

    PATH=YOUR_COPIED_CONTENT
    

    And now at the end of your crontab, write this.

    @reboot cd /path/to/your/project/folder && nohup scrapyd >& /dev/null &
    

    This means, each time your server is restarted, above command will automatically run