I have a process whereby a web server injects a file (via an upload), saves that file to S3 using default_storages, then creates a task for that file to be processed by the backend via celery.
def upload_file(request):
path = 'uploads/my_file.csv'
with default_storage.open(path, 'w') as file:
file.write(request.FILES['upload'].read().decode('utf-8-sig'))
process_upload.delay(path)
return HttpResponse()
@shared_task
def process_upload(path):
with default_storage.open(path, 'r') as file:
dialect = csv.Sniffer().sniff(file.read(1024]))
file.seek(0)
reader = csv.DictReader(content, dialect=dialect)
for row in reader:
# etc...
The problem is that, although I'm using text-mode explicitly on writing and read, when I read the file it comes through as bytes
, which the csv library cannot handle. Is there any way around this without reading in and decoding the whole file in memory?
Seems like you need to add the b
(binary mode) to the open
call:
From the docs:
'b'
appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.
@shared_task
def process_upload(path):
with default_storage.open(path, 'rb') as file:
# Rest of your code goes here.