Search code examples
javapdfboxnetbeans-8.1

writing the output of a program into a file


I have written a program to parse the pdf into text. Im getting the output in the console,but im not able to write it into a fle. This is the code that i have done:

public class PDFTextParser {

public static void main(String args[]) throws IOException {
    PDFTextStripper pdfStripper = null;
    COSDocument cosDoc = null;
    try {


         File file = new File("1.pdf");
         PDDocument pdDoc = PDDocument.load(file);
         pdfStripper = new PDFTextStripper();
         String parsedText = pdfStripper.getText(pdDoc);
         System.out.println(parsedText);
         FileWriter out = new FileWriter("output.txt"); 
         BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
         String line = in.readLine();
         while (line!= null) {

                 out.append(line);
                 out.append("\n");
               }
        out.close();
    }catch (IOException e) {
         e.printStackTrace();}
   }
}

the output is:

Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser      parseFileObject
WARNING: Object (6:0) at offset 1013093 does not end with 'endobj' but  with '7'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (7:0) at offset 1013211 does not end with 'endobj' but with '483'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (9:0) at offset 1020280 does not end with 'endobj' but with '10'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (10:0) at offset 1020396 does not end with 'endobj' but with '15'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (15:0) at offset 1020519 does not end with 'endobj' but with '16'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (16:0) at offset 1020640 does not end with 'endobj' but with '17'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (17:0) at offset 1020756 does not end with 'endobj' but with '18'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (18:0) at offset 1020874 does not end with 'endobj' but with '19'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (19:0) at offset 1020993 does not end with 'endobj' but with '24'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (24:0) at offset 1021111 does not end with 'endobj' but with '25'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (25:0) at offset 1021228 does not end with 'endobj' but with '26'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (26:0) at offset 1021350 does not end with 'endobj' but with '27'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (27:0) at offset 1021469 does not end with 'endobj' but with '28'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (28:0) at offset 1021589 does not end with 'endobj' but with '489'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (458:0) at offset 1026684 does not end with 'endobj' but with '463'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (463:0) at offset 1026809 does not end with 'endobj' but with '464'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (464:0) at offset 1026932 does not end with 'endobj' but with '465'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (465:0) at offset 1027050 does not end with 'endobj' but with '466'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (466:0) at offset 1027170 does not end with 'endobj' but with '495'

and the parsed pdf text is appearing in the console..but i get an empty file as output


Solution

  • you have already got the text from the PDF, just write it to the file, the rest of the code trys to get input from user (ex, keyboard) you don't need it, just use below code:

    String parsedText = pdfStripper.getText(pdDoc);
    System.out.println(parsedText);
    FileWriter out = new FileWriter("output.txt"); 
    out.append(parsedText);
    out.close();
    
    //no need for this code, it reads input from user (using keyboard)
     /*
     BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
     String line = in.readLine();
     while (line!= null) {
    
             out.append(line);
             out.append("\n");
           }
    out.close();
    */