Search code examples
vbatextms-wordfso

combining excel and word VBA scripts. fso.CreateTextFile. Covert Word to text file


I have the following part of a script that converts a Word document (Previously converted from a PDF) to a text file. This is usually a function as part of a larger script but for the purposes of this question this is fine.

Sub GetTextFromWord()

    Dim fso As FileSystemObject
    Dim oWd As Object, oDoc As Object
    
    Set fso = New FileSystemObject
    Set oWd = CreateObject("word.application")
    
    Set oDoc = oWd.Documents.Open("C:\temp\PDFs\XFA006HH - Granular Sulphamic acid - Univar - 19-05-2021.pdf.doc")

    filePath = "C:\temp\PDFs\" & "TEST" & ".txt"  'filename
    Debug.Print filePath
    'open text stream as unicode
    Set fileStream = fso.CreateTextFile(filePath, overwrite:=True, Unicode:=True)
                
    fileStream.Write oDoc.Range.Text
    fileStream.Close
    oDoc.Close

    oWd.Quit

End Sub

The TEST file generated is okay however lacks the subsection numbers that would normally be present. enter image description here

When I generate the text file manually open the word doc. (File Export > Change file type > plain text (save). With options Windows Default selected, Insert line breaks unticked and allows for character substitution.

enter image description here

The generated text file is as desired.

enter image description here

When I record a macro in word for the same steps, I get the following script:

Sub Macro2()

' Macro2 Macro
'
'
    ActiveDocument.SaveAs2 FileName:= _
        "XFA006HH - Granular Sulphamic acid - Univar - 19-05-2021.pdf.txt", _
        FileFormat:=wdFormatText, LockComments:=False, Password:="", _
        AddToRecentFiles:=True, WritePassword:="", ReadOnlyRecommended:=False, _
        EmbedTrueTypeFonts:=False, SaveNativePictureFormat:=False, SaveFormsData _
        :=False, SaveAsAOCELetter:=False, Encoding:=1252, InsertLineBreaks:=False _
        , AllowSubstitutions:=True, LineEnding:=wdCRLF, CompatibilityMode:=0
End Sub

I would like to modify the first script to incorporate these parameters (mainly InsertLineBreaks:=False, AllowSubstitutions:=True - unsure if the others are required to generate the text file as exact). Ideally, I can incorporate as many as feasible to play around with and see the effect of the file generated. Things like LockComments:=False, Password:="" are not required.

How can I incorporate the script to achieve this?

fso.CreateTextFile doesn't appear to give such options so I wonder if I need to rethink this.

Link to Doc file:

https://1drv.ms/u/s!AsrLaUgt0KCLhiPc1u_vlYjFfsev?e=nlFn76

Update:

enter image description here


Solution

  • Please, try the next updated code. It replaces the VBScript object method with the one you tested:

    Sub GetTextFromWord()
        Dim fso As FileSystemObject
        Dim oWd As Object, oDoc As Object
        Const wdFormatText as Long = 2, wdCRLF as Long = 0
    
        Set fso = New FileSystemObject
        Set oWd = CreateObject("word.application")
    
        Set oDoc = oWd.Documents.Open("C:\temp\PDFs\XFA006HH - Granular Sulphamic acid - Univar - 19-05-2021.pdf.doc")
    
        Dim filePath As String: filePath = "C:\temp\PDFs\" & "TEST" & ".txt"  'filename
        Debug.Print filePath
        
        oDoc.SaveAs2 fileName:=filePath, _
            FileFormat:=wdFormatText, LockComments:=False, Password:="", _
            AddToRecentFiles:=True, WritePassword:="", ReadOnlyRecommended:=False, _
            EmbedTrueTypeFonts:=False, SaveNativePictureFormat:=False, SaveFormsData _
            :=False, SaveAsAOCELetter:=False, Encoding:=1252, InsertLineBreaks:=False _
            , AllowSubstitutions:=True, LineEnding:=wdCRLF, CompatibilityMode:=0
            
        oDoc.Close False
        oWd.Quit
    End Sub