Search code examples
pdfboxnetbeans-8.1

pdf to text coversion using pdfbox in netbeans 8.1


Im trying to convert a pdf document to text but im getting a null pointer exception..Do not understand why the error is coming. The error is shown in the import statement. Im attaching the code below:

public class PDFTextParser {

    private static Object f;

    public static void main(String args[]) {
    PDFTextStripper pdfStripper = null;
    PDDocument pdDoc = null;
    COSDocument cosDoc = null;

    File file = new File("D:\\1.pdf");
    try {
        f = null;
        PDFParser parser = new PDFParser((RandomAccessRead) f);
        FileInputStream f= new FileInputStream(file); 
        parser.parse();
        cosDoc = parser.getDocument();
        pdfStripper = new PDFTextStripper();
        pdDoc = new PDDocument(cosDoc);
        pdfStripper.setStartPage(1);
        pdfStripper.setEndPage(5);
        String parsedText = pdfStripper.getText(pdDoc);
        System.out.println(parsedText);
        } catch (IOException e) {
         e.printStackTrace();
       } 
      }
   }


   This is the error im getting:
    Exception in thread "main" java.lang.NullPointerException
     at org.apache.pdfbox.pdfparser.PDFParser.<init>(PDFParser.java:138)
     at org.apache.pdfbox.pdfparser.PDFParser.<init>(PDFParser.java:102)
     at org.apache.pdfbox.pdfparser.PDFParser.<init>(PDFParser.java:61)
     at PDFTextParser.main(PDFTextParser.java:33)

Solution

  • Yes, you are passing the null object:

        f = null;
        PDFParser parser = new PDFParser((RandomAccessRead) f);
    

    Btw, as a bonus, here's some more current (and much shorter) code to open a PDF file with PDFBox, I've left out the exception handling:

        File file = new File("D:\\1.pdf");
        PDDocument pdDoc = PDDocument.load(file);
        pdfStripper = new PDFTextStripper();
        pdfStripper.setStartPage(1);
        pdfStripper.setEndPage(5);
        String parsedText = pdfStripper.getText(pdDoc);
        System.out.println(parsedText);