I'm trying to convert a dataframe
to xlsx
, and writing it to Google Cloud storage using Cloud functions, but it's not working. It worked when I tested and deployed, but when I tried to trigger with pub/sub and scheduler, it gave the "OSError: [Errno 30] Read-only file system" error
. The code is below:
bucket_name = 'bucket-main'
blob_name = 'file-reception/'
df = client.query(query).to_dataframe()
xlsx_file = 'test'
writer = pandas.ExcelWriter(xlsx_file)
df.to_excel(writer, index=False)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name+xlsx_file)
blob.upload_from_filename(xlsx_file)
Searched a lot on what it could be, but really don't understand why the cloud function test and deploy worked (actually wrote the wile in the gcs directory) but didn't when I used scheduler + pub/sub.
Tried to change the engine on the df.to_excel
between xlsxwritter
and openpyxl
, nothing changed.
EDIT - NEW CODE:
bucket_name = 'bucket-main'
blob_name = 'file-reception/'
df = client.query(query).to_dataframe()
xlsx_file = 'test'
**tmp_directory = '/tmp/'**
writer = pandas.ExcelWriter(**tmp_directory**+xlsx_file)
df.to_excel(writer, index=False)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name+xlsx_file)
blob.upload_from_filename(**writer**)
You didn't specify a full path to the file (xlsx_file
), so it's really hard to say for sure where in the file system it's actually trying to create it. From the documentation:
The function execution environment includes an in-memory file system that contains the source files and directories deployed with your function (see Structuring source code). The directory containing your source files is read-only, but the rest of the file system is writeable (except for files used by the operating system). Use of the file system counts towards a function's memory usage.
So, if you end up trying to write to the "directory containing your source files", I would expect it to fail. Instead, consider specify a full path to a filesystem that is writable. Typically, this directory is the "temporary" filesystem at /tmp. But instead of hard-coding that path, you should use the API that Python gives you to get that path no matter what the host OS is.
Note that, in Cloud Functions, files written to disk actually occupy memory. If you write too much data, you could end up running out of memory and crashing. You should also delete any temporary files immediately after you don't need them anymore.