Recently I have been trying to convert .doc files into a new format, so that it is easier to work with the data. So, I decided to convert the .doc files to .docx files because there is a lot of flexibility from there, and I thought this task would be easy. However, I thought wrong. I am currently trying to use Win32 to access Word and for some reason it isn't working. Here is my code:
import win32com.client as win32
import os
import re
from win32com.client import constants
def SaveAsDocx(path):
word = win32.gencache.EnsureDispatch("Word.Application")
doc = word.Documents.Open(path)
doc.Activate()
new_file_abs = os.path.abspath(path)
new_file_abs = re.sub(r'\. \w+$', '.docx', new_file_abs)
word.ActiveDocument.SaveAs(
new_file_abs, FileFormat=constants.wdFormatXMLDocument
)
doc.Close(False)
print('done')
SaveAsDocx("(1)2014-06-18.doc")
The error I get is:
Traceback (most recent call last):
File "c:/Users/gawel/OneDrive/Desktop/scraping/doctotxt.py", line 20, in <module>
SaveAsDocx("(1)2014-06-18.doc")
File "c:/Users/gawel/OneDrive/Desktop/scraping/doctotxt.py", line 9, in SaveAsDocx
doc.Activate()
AttributeError: 'NoneType' object has no attribute 'Activate'
I have done a lot of research, and I just don't know where to go from here. I thought maybe something is wrong with my Word application, but I don't know how to fix it. Any help would be appreciated. Additionally, if anyone knows of a different approach to ultimately convert .doc files into TXT/PDF/DOCX files, please let me know. This seemingly easy project has consumed way too much of my time.
You are almost there and need to change just a few little things:
Activate()
, you can omit thisSo this should work:
def SaveAsDocx(path):
word = win32.gencache.EnsureDispatch("Word.Application")
doc = word.Documents.Open(path)
new_file_abs = re.sub(r'\.doc', '.docx', os.path.abspath(path))
word.ActiveDocument.SaveAs(new_file_abs, FileFormat=constants.wdFormatXMLDocument)
doc.Close(False)
word.Application.Quit(-1)
print('done')