Search code examples
pythonms-wordtext-extraction

Converting .doc to pure text using Python


I am trying to use textract to convert my .doc files to pure text.

import textract
text = textract.process('path/to/file.extension')

But I am getting this error

AttributeError: 'module' object has no attribute 'process'

Solution

  • Make sure that the Python file you are trying to run is not named textract.py.

    If that's the name, you will get the error:

    AttributeError: 'module' object has no attribute 'process'