Search code examples
vbaautomationms-word

Automate .doc to .htm process in Word


Question

We inherited an older project from another company, and this project has a "help" index made up of htm files that were converted from .doc files. The issue is, their team exported all of these files in a very outdated and not supported encoding so they are packed with random special character alts.

Eventually we will replace this system with a MUCH easier to use and develop one, but given that the product came with a large userbase, in the meantime we need to fix this. Is there some automation tool for this (that still works in present day, I've tried a couple older vb scripts), or am I going to need to manually re-export a few hundred docs today? (its not necessarily a huge issue, but there are other things that I think my time would be better spent on working on today)

To be very clear: I have a folder full of .doc files that need to be re-saved as .htm files with UTF-encoding

What I've tried

I've been digging through several SO posts trying various solutions. My current code is as follows:

Sub ChangeDocsToTxtOrRTFOrHTML()
    Dim locFolder As String
    Dim fileType As String
    Dim oFolder As Object
    Dim tFolder As Object
    Dim fs As Object
    
    locFolder = "C:\Users\ColeD\Desktop\Help Files Angular"
    fileType = ".htm"
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.GetFolder(locFolder & "Converted")
    
    For Each oFile In oFolder.Files
    MsgBox ("hrtr!")
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        strDocName = strDocName & fileType
        ChangeFileOpenDirectory tFolder
        
        ActiveDocument.SaveAs2 FileName:=strDocName & fileType, _
                               FileFormat:=wdFormatHTML, _
                               Encoding:=msoEncodingUTF8

        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    MsgBox ("Done!")
End Sub

The issue is, it only opens one file then stops


Solution

  • It looks like you are using code copied from Convert multiple Word documents to HTML files using VBA

    But you need to work with the code to make it work in your scenario which is only HTML, not the other file types. See below example for focusing on docx to HTML.

    Sub test()
    
    Dim fpath As String
    Dim StrFile As String
    
    On Error Resume Next
        Set wordapp = CreateObject("word.Application")
        wordapp.Visible = True
    On Error GoTo 0
    
    fpath = "C:\Users\user\"
    StrFile = Dir(fpath & "*.doc*")
        
        Do While Len(StrFile) > 0
            wordapp.documents.Open fpath & StrFile
            Filename = CreateObject("Scripting.FileSystemObject").GetBaseName(StrFile)
            outputFileName = fpath & Filename & ".html"
            Debug.Print outputFileName
            Application.DisplayAlerts = False
            Debug.Print wordapp.ActiveDocument.Name
            wordapp.ActiveDocument.SaveAs Filename:=outputFileName, FileFormat:=8 'wdFormatFilteredHTML
            Application.DisplayAlerts = True
            wordapp.ActiveDocument.Close
            Debug.Print StrFile
            StrFile = Dir
        Loop
    
    End Sub