Search code examples
pythondoc

Trouble with accessing Word using Win32 and COM


Recently I have been trying to convert .doc files into a new format, so that it is easier to work with the data. So, I decided to convert the .doc files to .docx files because there is a lot of flexibility from there, and I thought this task would be easy. However, I thought wrong. I am currently trying to use Win32 to access Word and for some reason it isn't working. Here is my code:

import win32com.client as win32
import os
import re
from win32com.client import constants 

def SaveAsDocx(path):
  word = win32.gencache.EnsureDispatch("Word.Application")
  doc = word.Documents.Open(path)
  doc.Activate()

  new_file_abs = os.path.abspath(path)
  new_file_abs = re.sub(r'\. \w+$', '.docx', new_file_abs)

  word.ActiveDocument.SaveAs(
      new_file_abs, FileFormat=constants.wdFormatXMLDocument
  )
  doc.Close(False)
  print('done')

SaveAsDocx("(1)2014-06-18.doc")

The error I get is:

Traceback (most recent call last):
  File "c:/Users/gawel/OneDrive/Desktop/scraping/doctotxt.py", line 20, in <module> 
    SaveAsDocx("(1)2014-06-18.doc")
  File "c:/Users/gawel/OneDrive/Desktop/scraping/doctotxt.py", line 9, in SaveAsDocx
    doc.Activate()
AttributeError: 'NoneType' object has no attribute 'Activate'

I have done a lot of research, and I just don't know where to go from here. I thought maybe something is wrong with my Word application, but I don't know how to fix it. Any help would be appreciated. Additionally, if anyone knows of a different approach to ultimately convert .doc files into TXT/PDF/DOCX files, please let me know. This seemingly easy project has consumed way too much of my time.


Solution

  • You are almost there and need to change just a few little things:

    • No need for Activate(), you can omit this
    • I think that your regular expression does not do the job correctly
    • You should quit the Word application after saving the file

    So this should work:

    def SaveAsDocx(path):
      word = win32.gencache.EnsureDispatch("Word.Application")
      doc = word.Documents.Open(path)
      new_file_abs = re.sub(r'\.doc', '.docx', os.path.abspath(path))
      word.ActiveDocument.SaveAs(new_file_abs, FileFormat=constants.wdFormatXMLDocument)
      doc.Close(False)
      word.Application.Quit(-1)
      print('done')