Pretty new to the world of VBA, but have been asked if I can make some QoL improvements for routine tasks at work. One task I am struggling to get VBA working for is separating out a word document based on headings? For example we would receive a document along the following lines:
Info Line 1 Info Line 2
Data_Start
-multiple lines of data-
Data_Start
-multiple lines of data-
And so on. What I would like to do is run a Macro that would separate out each "Data_Start" section into a new document - ideally with the same Document Title suffixed with A, B, C etc to denote each new section (or numerically if alphabetically is an issue). Another wishlist item is to have those documents save into the same folder as the original.
I found some short vba code that separates out based on section breaks, with the aim of using it as a jumping off point to build on. However my limited knowledge of VBA is leading to roadblocks in getting things working correctly and I am running into issues with documents saving correctly (place and/or filename).
I understand this a big request but any help would be greatly appreciated!
To split a document at headings you first need to find the headings. As Word's Find
function can do this we don't need to resort to crawling through the document paragraph by paragraph.
When coding in Word it is desirable to avoid using the Selection
object. This is because using Selection
will cause the screen to be redrawn each time it changes, slowing down the code. Instead we work with Range objects.
When Find
returns a match .Execute
will return a value of True and the Range is redefined to the range of the match.
To get the full text below the heading we can use one of the predefined bookmarks in Word, \HeadingLevel.
To avoid using the clipboard, which slows things down when used within a loop, we make use of the FormattedText property of the range.
Sub SplitDocumentAtHeadings()
Dim findRange As Range: Set findRange = ActiveDocument.Content
Dim newDoc As Document
Dim index As Long
Dim saveRange As Range
Dim savePath As String: savePath = ActiveDocument.Path & "\"
Dim saveName As String
With findRange
With .Find
.ClearFormatting
.Style = ActiveDocument.Styles(wdStyleHeading1)
.Forward = True
.Format = True
.Wrap = wdFindStop
End With
Do While .Find.Execute
index = index + 1
'use the heading text as the save name
saveName = .Text & index & ".docx"
'get the heading and data that follows it
Set saveRange = .Duplicate
'saveRange.Move wdParagraph
Set saveRange = saveRange.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
Set newDoc = Documents.Add
'use formatted text to avoid using the clipboard
newDoc.Content.FormattedText = saveRange.FormattedText
newDoc.SaveAs2 FileName:=savePath & saveName, Fileformat:=WdSaveFormat.wdFormatXMLDocument
.Collapse wdCollapseEnd
Loop
End With
End Sub
EDIT:
In response to OP's answer. Please read comments in code to understand where mistakes were made.
Sub SplitDocOnHeading1ToDocxWithHeadingInOutput()
Application.ScreenUpdating = False
Dim Rng As Range
Dim DocSrc As Document
Dim DocTgt As Document
'incorrectly spelt variable name
'Dim sectionCoun As Integer
Dim sectionCount As Integer
Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
Set DocSrc = ActiveDocument
sectionCount = 0
With DocSrc.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = True
.Forward = True
.Text = "Data_Start"
.Style = wdStyleHeading1
.Replacement.Text = ""
.Wrap = wdFindStop
'.Execute
End With
'Find.Found is unreliable. Use Execute instead
'Do While .Find.Found
Do While .Find.Execute
'added incrementing of the section count
sectionCount = sectionCount + 1
Set Rng = .Paragraphs(1).Range
Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
With DocTgt
'Not necessary as already set at the beginning
'Application.ScreenUpdating = False
.Range.FormattedText = Rng.FormattedText
StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
' Strip out illegal characters
For i = 1 To Len(StrNoChr)
StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
Next
'.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
'variable sectionCount was declared as sectionCoun and is not incremented in the code
.SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & sectionCount & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
.Close False
End With
.Start = Rng.End
'Not needed
'.Find.Execute
Loop
End With
Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
Application.ScreenUpdating = True
End Sub