Search code examples
python-2.7scrapywait

Python/Scrapy wait until complete


Trying to get a project I'm working on to wait on the results of the Scrapy crawls. Pretty new to Python but I'm learning quickly and I have liked it thus far. Here's my remedial function to refresh my crawls;

def refreshCrawls():
     os.system('rm JSON/*.json)

     os.system('scrapy crawl TeamGameResults -o JSON/TeamGameResults.json --nolog')
     #I do this same call for 4 other crawls also

This function gets called in a for loop in my 'main function' while I'm parsing args:

for i in xrange(1,len(sys.argv)):
     arg = sys.argv[i]
     if arg == '-r':
          pprint('Refreshing Data...')
          refreshCrawls()

This all works and does update the JSON files, however the rest of my application does not wait on this as I foolishly expected it to. Didn't really have a problem with this until I moved the app over to a Pi and now the poor little guy can't refresh soon enough, Any suggestions on how to resolve this?

My quick dirty answer says split it into a different automated script and just run it an hour or so before I run my automated 'main function,' or use a sleep timer but I'd rather go about this properly if there's some low hanging fruit that can solve this for me. I do like being able to enter the refresh arg in my command line.


Solution

  • Instead of using os use subprocess:

    from subprocess import Popen
    import shlex
    
    def refreshCrawls():
         os.system('rm JSON/*.json')
         cmd = shlex.split('scrapy crawl TeamGameResults -o JSON/TeamGameResults.json --nolog')
         p = Popen(cmd)
         #I do this same call for 4 other crawls also
         p.wait()
    
    for i in xrange(1,len(sys.argv)):
         arg = sys.argv[i]
         if arg == '-r':
              pprint('Refreshing Data...')
              refreshCrawls()