Search code examples
vbapdfadobeacrobatacrobat-sdk

Copy All Text from PDF to Windows Clipboard


I'm working in VBA (MS Office 2010) and want to extract some key words from PDF attachments that I regularly receive in Outlook.

I planned to save the PDFs as a Word documents and extract the text from these but apparently I cannot do this programatically as I'm using Acrobat X Standard (seems I would need Pro).

So, am now looking for a way to copy all text from a PDF doc to the Windows clipboard using Acrobat Library methods. I will then paste into Word (this copy/paste works fine when done manually - no corruption of text).

I have very limited experience working with Acrobat and am reviewing the Acrobat SDK resources etc. but proving challenging.

How can I select all text in a PDF document and copy it to the Windows clipboard using Acrobat Library methods in VBA?


Solution

  • For reference, I resolved using the code below.

    This quickly coverts a PDF file into a text file and from there key words can be selected and read into a string, put into the clipboard etc.

    This is working with Adobe X Standard.

    Code is from http://forum.chandoo.org/threads/vba-to-convert-pdf-to-txt.14245/

    Dim AcroXApp As Acrobat.AcroApp
    Dim AcroXAVDoc As Acrobat.AcroAVDoc
    Dim AcroXPDDoc As Acrobat.AcroPDDoc
    Dim Filename As String
    Dim jsObj As Object
    Dim NewFileName As String
    
    Filename = "C:\Documents and Settings\xxx\Desktop\file01.pdf"
    NewFileName = "U:\file.txt"
    
    Set AcroXApp = CreateObject("AcroExch.App")
    'AcroXApp.Show
    
    Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
    AcroXAVDoc.Open Filename, "Acrobat"
    AcroXApp.Hide 'my additon - needed?
    
    Set AcroXPDDoc = AcroXAVDoc.GetPDDoc
    
    Set jsObj = AcroXPDDoc.GetJSObject
    
    jsObj.SaveAs NewFileName, "com.adobe.acrobat.plain-text"
    
    AcroXAVDoc.Close False
    AcroXApp.Hide
    AcroXApp.Exit
    
    End Sub