I'm trying to index PDF files using Apache Lucene 4.4
I keep getting the following exception:
Exception in thread "main" java.lang.NoSuchFieldError: TOKENIZED
at com.snowtide.pdf.lucene.LuceneInterface20.addField(SourceFile:18)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:174)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:84)
at com.apache.lucene.search.EasyLuceneIntegration.addPDFToIndex(EasyLuceneIntegration.java:134)
at com.apache.lucene.search.EasyLuceneIntegration.main(EasyLuceneIntegration.java:62)
I'm using PDFTextStream and following their example in here: enter link description here
The project you've referenced only supports up to Lucene 2.2. I'd recommend looking into tika, to get your PDFs into an acceptable format, or you can just use pdfbox (which, I believe, is the package Tika uses for PDFs).