Search code examples
vbams-word

Splitting out a Word Document using VBA


Pretty new to the world of VBA, but have been asked if I can make some QoL improvements for routine tasks at work. One task I am struggling to get VBA working for is separating out a word document based on headings? For example we would receive a document along the following lines:

Info Line 1 Info Line 2

Data_Start

-multiple lines of data-

Data_Start

-multiple lines of data-

And so on. What I would like to do is run a Macro that would separate out each "Data_Start" section into a new document - ideally with the same Document Title suffixed with A, B, C etc to denote each new section (or numerically if alphabetically is an issue). Another wishlist item is to have those documents save into the same folder as the original.

I found some short vba code that separates out based on section breaks, with the aim of using it as a jumping off point to build on. However my limited knowledge of VBA is leading to roadblocks in getting things working correctly and I am running into issues with documents saving correctly (place and/or filename).

I understand this a big request but any help would be greatly appreciated!


Solution

  • To split a document at headings you first need to find the headings. As Word's Find function can do this we don't need to resort to crawling through the document paragraph by paragraph.

    When coding in Word it is desirable to avoid using the Selection object. This is because using Selection will cause the screen to be redrawn each time it changes, slowing down the code. Instead we work with Range objects.

    When Find returns a match .Execute will return a value of True and the Range is redefined to the range of the match.

    To get the full text below the heading we can use one of the predefined bookmarks in Word, \HeadingLevel.

    To avoid using the clipboard, which slows things down when used within a loop, we make use of the FormattedText property of the range.

    Sub SplitDocumentAtHeadings()
            
        Dim findRange As Range: Set findRange = ActiveDocument.Content
        Dim newDoc As Document
        Dim index As Long
        Dim saveRange As Range
        Dim savePath As String: savePath = ActiveDocument.Path & "\"
        Dim saveName As String
        
        With findRange
            With .Find
                .ClearFormatting
                .Style = ActiveDocument.Styles(wdStyleHeading1)
                .Forward = True
                .Format = True
                .Wrap = wdFindStop
            End With
            
            Do While .Find.Execute
                index = index + 1
                'use the heading text as the save name
                saveName = .Text & index & ".docx"
                'get the heading and data that follows it
                Set saveRange = .Duplicate
                'saveRange.Move wdParagraph
                Set saveRange = saveRange.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
                Set newDoc = Documents.Add
                'use formatted text to avoid using the clipboard
                newDoc.Content.FormattedText = saveRange.FormattedText
                newDoc.SaveAs2 FileName:=savePath & saveName, Fileformat:=WdSaveFormat.wdFormatXMLDocument
                .Collapse wdCollapseEnd
            Loop
        End With
    
    End Sub
    

    EDIT:

    In response to OP's answer. Please read comments in code to understand where mistakes were made.

    Sub SplitDocOnHeading1ToDocxWithHeadingInOutput()
    
    Application.ScreenUpdating = False
    Dim Rng As Range
    Dim DocSrc As Document
    Dim DocTgt As Document
    'incorrectly spelt variable name
    'Dim sectionCoun As Integer
    Dim sectionCount As Integer
    Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
    Set DocSrc = ActiveDocument
    sectionCount = 0
    
    With DocSrc.Range
        With .Find
            .ClearFormatting
            .Replacement.ClearFormatting
            .Format = True
            .Forward = True
            .Text = "Data_Start"
            .Style = wdStyleHeading1
            .Replacement.Text = ""
            .Wrap = wdFindStop
            '.Execute
        End With
        'Find.Found is unreliable. Use Execute instead
        'Do While .Find.Found
        Do While .Find.Execute
            'added incrementing of the section count
            sectionCount = sectionCount + 1
            Set Rng = .Paragraphs(1).Range
            Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
            Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
            With DocTgt
                'Not necessary as already set at the beginning
                'Application.ScreenUpdating = False
                .Range.FormattedText = Rng.FormattedText
                StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
                ' Strip out illegal characters
                For i = 1 To Len(StrNoChr)
                    StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
                Next
                '.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
                
                'variable sectionCount was declared as sectionCoun and is not incremented in the code
                .SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & sectionCount & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
                .Close False
            End With
        
            .Start = Rng.End
            'Not needed
            '.Find.Execute
        Loop
    End With
    
    Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
    Application.ScreenUpdating = True
    

    End Sub