I'm having difficulty passing the path of a file to a library called Textract.
def file_to_string(filepath):
text = textract.process(filepath)
print text
return text
Here is my upload form in views.py
if request.method == 'POST':
upload_form = UploadFileForm(request.POST, request.FILES)
if upload_form.is_valid():
file = request.FILES['file']
filetosave = File(file=file, filename=file.name)
filetosave.save()
if validate_file_extension(file):
request.session['text'] = file_to_string(file) # something in here
else:
upload_form=UploadFileForm()
models.py
class File(models.Model):
filename = models.CharField(max_length=200)
file = models.FileField(upload_to='files/%Y/%m/%d')
upload_date=models.DateTimeField(auto_now_add =True)
status = models.CharField(max_length=200)
def __unicode__(self):
return self.filename
Now Textract expects a path to go into file_to_string(filepath)
. If I try to pass in the file object it gives me an error: "coercing to Unicode: need string or buffer, InMemoryUploadedFile found"
.
But if it is an InMemoryUploadedFile type, how do I get the path? I understand this is stored in memory and doesn't have a path.
How should I handle this -- should I save the file object first and then try to access it?
If I save the file and then try request.session['text'] = file_to_string(file.name)
it gives a MissingFileError
, though the docs say that this should give the name of the file including the relative path from MEDIA_ROOT.
Thanks a lot in advance.
request.session['text'] = file_to_string(filetosave.file.path)
should do the trick