Search code examples
vbapdfboxpowerpointpowerpoint-2013

Extract titles from powerpoint textboxes (not placeholders)?


I've a PDF file which was originally created from a PPT (which I don't have access to). I need to extract the titles/headings from each page of the PDF into a single document (format irrelevant; Excel, Notepad, Word, anything will do). The file is big therefore, cant be done manually and I will have to do it for similar files again.

I concluded that converting the PDF back into PPT format would help and I am trying to write a Subroutine in PowerPoint VBA. Please take a look at the code below and suggest what I can change to accomplish this? Alternate ideas also welcome.

Heads up: Once converted back into PPT, the titles in each slide are no longer in the 'Title' placeholder, in PowerPoint. They are just normal textboxes. I'm new to VBA, I've compiled the code by Googling.

Output: This prints out a notepad file with list of slide numbers. For each slide it prints that respective slide number as many times as there are textboxes in the slide. For example: Slide 1 has 3 textboxes therefore, the notepad reads:

"Slide: 1

Slide: 1

Slide: 1

Slide: 2

Slide: 2

Slide: 2

Slide: 2

Slide: 2

Slide: 2

Slide: 2"

Problem: It is not printing the text from the textboxes. Actually, I need the text from only the top textbox (which is first or topmost placed on the slide).

Code:

Sub GatherTitles()

On Error GoTo ErrorHandler

Dim oSlide As Slide
Dim strTitles As String
Dim strFilename As String
Dim intFileNum As Integer
Dim PathSep As String
Dim Shp As Shape

If ActivePresentation.Path = "" Then
    MsgBox "Please save the presentation then try again"
    Exit Sub
End If

#If Mac Then
    PathSep = ":"
#Else
    PathSep = "\"
#End If

On Error Resume Next  ' in case there's no title placeholder on the slide
For Each oSlide In ActiveWindow.Presentation.Slides

    For Each Shp In oSlide.Shapes
      Select Case Shp.Type
        Case MsoShapeType.msoTextBox

    strTitles = strTitles _
        & "Slide: " _
        & CStr(oSlide.SlideIndex) & vbCrLf _
        & oSlide.Shapes(1).TextFrame.TextRange.Text _
        & vbCrLf & vbCrLf

        Case Else
          Debug.Print Sld.Name, Shp.Name, "This is not a text box"
      End Select

    Next Shp
Next oSlide
On Error GoTo ErrorHandler

intFileNum = FreeFile

' PC-Centricity Alert!
' This assumes that the file has a .PPT extension and strips it off to make the text file name.
strFilename = ActivePresentation.Path _
    & PathSep _
    & Mid$(ActivePresentation.Name, 1, Len(ActivePresentation.Name) - 4) _
    & "_Titles.TXT"

Open strFilename For Output As intFileNum
Print #intFileNum, strTitles

NormalExit:
Close intFileNum
Exit Sub

ErrorHandler:
MsgBox Err.Description
Resume NormalExit

End Sub

Solution

  • You really aren't doing anything with the variable Shp beyond checking if it is a text box. I don't have enough to go on to test the solution, but before the line

    & vbCrLf & vbCrLf
    

    try inserting the line

    & ": " & Shp.TextFrame.TextRange.Text _