Search code examples
pythonubuntuserviceweb-scraping

How can I write a non-stop crawler with python and run it on a server?


I want to write a scraper with python that crawl some urls and scrape and save datas. I know how can I write it as a simple program. I'm looking for a way to deploy it on my virtual server (running ubuntu) as a service to make it non-stop crawling. Could any one tell me How can I do this?


Solution

  • What you want to do is daemonize the process. This will be helpful in creating a daemon.

    Daemonizing the process will allow it to run in background mode, so as long as the server is running (even if a user isn't logged in) it will continue running.

    Here is an example daemon that writes the time to a file.

    import daemon
    import time
    
    def do_something():
        while True:
            with open("/tmp/current_time.txt", "w") as f:
                f.write("The time is now " + time.ctime())
            time.sleep(5)
    
    def run():
        with daemon.DaemonContext():
            do_something()
    
    if __name__ == "__main__":
        run()