I am trying to read a PDF document with Java. I am new to this and also new to configuring dependencies w/ intelliJ. The full error is:
Exception in thread "main" java.lang.NoSuchMethodError: 'void org.apache.fontbox.cmap.CMapParser.<init>(boolean)'
at org.apache.pdfbox.pdmodel.font.CMapManager.parseCMap(CMapManager.java:74)
at org.apache.pdfbox.pdmodel.font.PDFont.readCMap(PDFont.java:213)
at org.apache.pdfbox.pdmodel.font.PDFont.loadUnicodeCmap(PDFont.java:147)
at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:115)
at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:74)
at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:185)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:89)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:515)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:144)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:394)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:322)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:269)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:233)
at com.company.Main.main(Main.java:18)
My present code:
public static void main(String[] args) throws IOException {
PDDocument document = PDDocument.load(new File("src/dummy.pdf"));
if(!document.isEncrypted()){
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);
System.out.println("Text:"+text);
}
document.close();
}
Here are the jars I have installed:
Not sure what I am doing wrong.
Here is a link to the pdf I am trying to read. Note that it is in Japanese font:
Tilman Hausherr was correct. The difference in versions between PDFBox and Fontbox was the error.