Search code examples
pythonpandasbeautifulsoupflushfsync

Scrape file to be immediately available in python


I am using beautifulsoup to scrape .csv files from a selection of websites. I then would like to use them immediately in the same script and store them for later use. Currently, when I scrape and save the file it is not immediately available to the script, and is returning a NoneType error when attempting to load the csv as a dataframe. I have attempted to use

file_to_save.flush()

and

os.fsync(file_to_save.fileno())

to no avail. I have also tried not using the buffer when opening the file file_to_save = open(path + filename, 'wb', 0), and this still is not working.

My code is below (res is the response.read() of the request):

file_to_save = open(path + filename, 'wb', 0)
file_to_save.write(res)
file_to_save.flush()
os.fsync(file_to_save.fileno())
file_to_save.close()

When I re-run the script, it works as the file is saved and can be loaded into the df in a separate function. Any ideas as to how I can make the file immediately available?


Solution

  • I could not find a satisfactory solution to this, the above suggestions all failed.

    The way I solved this problem was by opening the scraped file as a pandas dataframe and returning this dataframe through the functions to be used elsewhere in the webapp. The file was still saved and available for the next use.