Search code examples
pythoncsvherokuweb-scrapingdownload

Can I use Heroku to Scrape Data to later Download?


I have a python script that scrapes data from different websites then writes some data into CSV files.

I run this every day from my local computer but I would like to make this automatically running on a server and possibly for free.

I tried PythonAnywhere but it looks that their whitelist stops me from scraping bloomberg.com.

I then passed to Heroku, I deployed my worker (the python script). Everything seems to work but looking with Heroku bash to the directory where the python script is supposed to write the CSV files nothing appears.

I also realised that I have no idea of how I would download those CSV files in case those were written.

I am wondering if I can actually achieve what I am trying to achieve with Heroku or if the only way to get a python script to work on a server is by paying for PythonAnywhere and avoid scraping restriction?


Solution

  • Heroku's filesystem is per-dyno and ephemeral. You must not save things to it for later use.

    One alternative is to write it to somewhere permanent, such as Amazon S3. You can use the boto library for this. Although you do have to pay for S3 storage and data, it's very inexpensive.