Search code examples
pythondjangodjango-storage

Celery, Django and S3 Default Storage causes file reading issues


I have a process whereby a web server injects a file (via an upload), saves that file to S3 using default_storages, then creates a task for that file to be processed by the backend via celery.

def upload_file(request):
  path = 'uploads/my_file.csv'
  with default_storage.open(path, 'w') as file:
    file.write(request.FILES['upload'].read().decode('utf-8-sig'))
  process_upload.delay(path)
  return HttpResponse()

@shared_task
def process_upload(path):
  with default_storage.open(path, 'r') as file:
    dialect = csv.Sniffer().sniff(file.read(1024]))
    file.seek(0)
    reader = csv.DictReader(content, dialect=dialect)
    for row in reader:
      # etc...

The problem is that, although I'm using text-mode explicitly on writing and read, when I read the file it comes through as bytes, which the csv library cannot handle. Is there any way around this without reading in and decoding the whole file in memory?


Solution

  • Seems like you need to add the b (binary mode) to the open call:

    From the docs:

    'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

    @shared_task
    def process_upload(path):
      with default_storage.open(path, 'rb') as file:
          # Rest of your code goes here.