Yet another question about the style and the good practices. The code, that I will show, works and do the functionality. But I'd like to know is it ok as solution or may be it's just too ugly?
As the question is a little bit obscure, I will give some points at the end.
So, the use case.
I have a site with the items. There is a functionality to add the item by user. Now I'd like a functionality to add several items via a csv-file.
How should it works?
I saw already examples for bulk upload for django, and they seem pretty clear. But I don't find an example with an intermediary "verify-confirm" page. So how I did it :
def upload_item_csv_file(request):
if request.method == 'POST':
form = UploadItemCsvFileForm(request.POST, request.FILES)
if form.is_valid():
uploaded_file_name = handle_uploaded_item_csv_file(request.FILES['item_csv_file'])
request.session['uploaded_file'] = uploaded_file_name
return redirect('show_upload_csv_item')
else:
form = UploadItemCsvFileForm()
return render(request, 'myapp/item_csv_upload.html', {'form': form})
def handle_uploaded_item_csv_file(f):
now = datetime.now()
# YY_mm_dd_HH_MM
dt_string = now.strftime("%Y_%m_%d_%H_%M")
file_name = os.path.join(settings.MEDIA_ROOT, f"tmp_csv/item_csv_{dt_string}.csv")
with open(file_name, 'wb+') as destination:
for chunk in f.chunks():
destination.write(chunk)
return f"tmp_csv/item_csv_{dt_string}.csv"
@transaction.atomic
def show_uploaded_file(request):
if 'uploaded_file' in request.session :
file_name = request.session['uploaded_file']
else :
print("Something wrong : raise 404")
raise Http404
if not os.path.isfile(os.path.join(settings.MEDIA_ROOT, file_name)):
print("Something wrong, file does not exist : raise 404")
raise Http404
with open(os.path.join(settings.MEDIA_ROOT, file_name)) as csvfile :
fieldnames = ['serial_number', 'type', 'shipping_date', 'comments']
csv_reader = csv.DictReader(csvfile, delimiter=';', fieldnames=fieldnames)
list_items = list(csv_reader)
if request.POST and ("confirm_items_upload" in request.POST) :
if request.POST["confirm_items_upload"] == "yes" :
for cur_item in list_items :
if not cur_item['shipping_date'] :
cur_item.pop('shipping_date', None)
try :
Item.objects.create(**cur_item)
except IntegrityError :
messages.warning(request, f"This Item : {cur_item} - already exists. No items were added." )
os.remove(os.path.join(settings.MEDIA_ROOT, file_name))
return redirect('items')
else :
return render(request, 'myapp/item_csv_uploaded.html', {'items': list_items})
class UploadItemCsvFileForm(forms.Form):
item_csv_file = forms.FileField()
Here are the questions/points.
a) Even if obviously it could be better, is this solution is acceptable or not at all ?
b) I pass 'uploaded_file' from one view to another using "request.session" is it a good practice? Is there another way to do it without using GET variables?
c) At first my wish was to avoid to save the csv-file. But I could not figure out how to do it? Reading all the file to request.session seems not a good idea for me. Is there some possibility to upload the file into memory in Django?
d) If I have to use the tmp-file. How should I handle the situation if user abandon upload at the middle (for example, he sees the confirmation page, but does not click "yes" and decide to re-write his file). How to remove the tmp-file?
e) Small additional question : what kind of checks there are in Django about uploaded file? For example, how could I check that the file is at least a text-file? Should I do it?
All others remarks are welcome as well.
a) Even if obviously it could be better, is this solution is acceptable or not at all ?
I think it has some problems you want to address, but the general idea of using the filesystem and storing just filenames can be acceptable, depending on how many users you need to serve and what guarantees regarding data consistency and concurrent accesses you want to make.
I would consider the uploaded file temporary data that may be lost on system failure. If you want to provide any guarantees of not losing the data, you want to store it in a database instead of on the filesystem.
b) I pass 'uploaded_file' from one view to another using "request.session" is it a good practice? Is there another way to do it without using GET variables?
There are up- and downsides to using request.session.
The last point correlates with the leftover files problem: If you lose your information about which files are still needed, it makes cleaning up harder (although, in theory, you can retrieve which files are still needed from the session store).
If it is a problem that sessions might end or change because users clear their cookies or change devices, you could consider adding the filename to the UserProfile
in the database. This way, it is not bound to sessions.
c) At first my wish was to avoid to save the csv-file. But I could not figure out how to do it? Reading all the file to request.session seems not a good idea for me. Is there some possibility to upload the file into memory in Django?
You want to store state. The go-to ways of storing state are the database or a session store. You could load the whole CSVFile and put it into the database as text. Whether this is acceptable depends on your databases ability to handle large, unstructured data. Traditional databases were not originally built for that, however, most of them can handle small binary files pretty well nowadays. A database could give you advantages like ACID guarantees where concurrent writes to the same file on the file system will likely break the file. See this discussion on the dba stackexchange
Your database likely has documentation on the topic, e.g. there is this page about binary data in postgres.
d) If I have to use the tmp-file. How should I handle the situation if user abandon upload at the middle (for example, he sees the confirmation page, but does not click "yes" and decide to re-write his file). How to remove the tmp-file?
Some ideas:
You probably want to limit your server to one file stored per user so that attackers can not fill your filesystem.
e) Small additional question : what kind of checks there are in Django about uploaded file? For example, how could I check that the file is at least a text-file? Should I do it?
You definitely want to set up some maximum file size for the file, as described e.g. here. You could limit the allowed file extensions, but that would only be a usability thing. Attackers could also give you garbage data with any accepted extension.
Keep in mind: If you only store the csv as text data that you load and parse everytime a certain view is accessed, this can be an easy way for attackers to exhaust your servers, giving them an easy DoS attack.
Overall, it depends on what guarantees you want to make, how many users you have and how trustworthy they are. If users might be malicious, you want to keep all possible kinds of data extraction and resource exhaustion attacks in mind. The filesystem will not scale out (at least not as easily as a database).
I know of a similar setup in a project where only a handful of priviliged users are allowed to upload stuff, and we can tolerate deletion of all temporary files on failure. Users will simply have to reupload their files. This works fine.