I'm using FastAPI (not async) and python-docx library, trying to read a docx file. I'm getting an error while trying to read the docx file.
My code -
@app.post('/translate_docx', response_class=PlainTextResponse)
def translateDocx(docFile: UploadFile = File(...), fileExtension: str = Form(...)):
if(fileExtension == 'docx'):
raw_txt = readDocx(docFile.file.read())
return raw_txt
def readDocx(file):
doc = Document(file)
txt = ""
for para in doc.paragraphs:
txt = txt + para.text
return txt
Logs:
File "/translateProject/.venv/lib/python3.7/site-packages/docx/opc/pkgreader.py", line 32, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "/translateProject/.venv/lib/python3.7/site-packages/docx/opc/phys_pkg.py", line 101, in __init__
self._zipf = ZipFile(pkg_file, 'r')
File "/usr/lib/python3.7/zipfile.py", line 1258, in __init__
self._RealGetContents()
File "/usr/lib/python3.7/zipfile.py", line 1321, in _RealGetContents
endrec = _EndRecData(fp)
File "/usr/lib/python3.7/zipfile.py", line 259, in _EndRecData
fpin.seek(0, 2)
AttributeError: 'bytes' object has no attribute 'seek'
What is wrong in my code ? Any help would be helpful.
Don't .read()
the file that is given to Document()
. Just give it the filename or you can give it an open file that has it's cursor set on offset 0. If the "file" you have is already a bytes object then you can use an io.BytesIO
"in-memory" file to give to Document()
.
The docx_file
parameter in Document(docx_file)
can be a str
file path or can be a file-like object (an open file created with open(...)
or an in-memory file created with io.BytesIO
), but it cannot be a bytes
object (what is returned by file.read()
).