Search code examples
pythondjangoopenpyxldjango-storage

Processing files stored on cloud (S3 or Spaces)


I've setup a script to process excel files uploaded by a user. The scripts works fine when the file is stored on the local disk.

from openpyxl import load_workbook

wb = load_workbook("file_path.xlsx")  # Load workbook from disk works fine
ws = wb.worksheets[0]

I've then setup django-storages to allow user uploaded files to be stored on digital ocean spaces.

My problem now is how to access and process the cloud stored file. For the record, if I pass the file URL to load_workbook it fails with the error No such file or directory: file_url.

Do I have to download the file using requests and then process it as a local file? Feels inefficient? What options do I have?


Solution

  • You can get byte content of the file, wrap it in ContentFile and pass it to openpyxl. Assuming your model is FileContainer and field name is file:

    from django.core.files.base import ContentFile
    from openpyxl import load_workbook
    
    fc = FileContainer.objects.first()
    bytefile = fc.file.read()
    wb = load_workbook(ContentFile(bytefile))
    ws = wb.worksheets[0]
    

    I checked it with S3 and it works just fine.

    If you want to actually save file locally, you can try this:

    from django.core.files.base import ContentFile
    from django.core.files.storage import FileSystemStorage
    from openpyxl import load_workbook
    
    fc = FileContainer.objects.first()
    local_storage = FileSystemStorage()
    
    bytefile = fc.file.read()
    newfile = ContentFile(bytefile)
    
    relative_path = local_storage.save(fc.file.name, newfile)
    
    wb = load_workbook(local_storage.path(relative_path))
    ws = wb.worksheets[0]