Search code examples
pythonpython-docxfastapi

Trying to read a docx file using FastAPI and python-docx library: AttributeError: 'bytes' object has no attribute 'seek' error


I'm using FastAPI (not async) and python-docx library, trying to read a docx file. I'm getting an error while trying to read the docx file.

My code -

@app.post('/translate_docx', response_class=PlainTextResponse)
def translateDocx(docFile: UploadFile = File(...), fileExtension: str = Form(...)):
 
    if(fileExtension == 'docx'):
        raw_txt = readDocx(docFile.file.read())

    return raw_txt


def readDocx(file):
    doc = Document(file)
    txt = ""
    for para in doc.paragraphs:
        txt = txt + para.text
    return txt

Logs:

File "/translateProject/.venv/lib/python3.7/site-packages/docx/opc/pkgreader.py", line 32, in from_file
    phys_reader = PhysPkgReader(pkg_file)
  File "/translateProject/.venv/lib/python3.7/site-packages/docx/opc/phys_pkg.py", line 101, in __init__
    self._zipf = ZipFile(pkg_file, 'r')
    
  File "/usr/lib/python3.7/zipfile.py", line 1258, in __init__
    self._RealGetContents()
    
  File "/usr/lib/python3.7/zipfile.py", line 1321, in _RealGetContents
    endrec = _EndRecData(fp)
  File "/usr/lib/python3.7/zipfile.py", line 259, in _EndRecData
    fpin.seek(0, 2)
    
AttributeError: 'bytes' object has no attribute 'seek'

What is wrong in my code ? Any help would be helpful.


Solution

  • Don't .read() the file that is given to Document(). Just give it the filename or you can give it an open file that has it's cursor set on offset 0. If the "file" you have is already a bytes object then you can use an io.BytesIO "in-memory" file to give to Document().

    The docx_file parameter in Document(docx_file) can be a str file path or can be a file-like object (an open file created with open(...) or an in-memory file created with io.BytesIO), but it cannot be a bytes object (what is returned by file.read()).