In Hibernate Search 6 the Apache Tika bridge has disappeared:
What is the best way to index the contents of a PDF or a Word document file now? Is there any alternative?
You could write your own bridge, as documented here.
Something like this:
public class TikaBridge implements ValueBridge<String, String> {
private final Parser parser;
public TikaBridge() {
parser = new AutoDetectParser();
public String toIndexedValue(String documentPath, ValueBridgeToIndexedValueContext context) {
if (value == null) {
return null;
try (InputStream input = Files.newInputStream(Paths.get(documentPath)) {
StringWriter writer = new StringWriter();
WriteOutContentHandler contentHandler = new WriteOutContentHandler(writer);
Metadata metadata = new Metadata();
ParseContext parseContext = new ParseContext();
parser.parse(input, contentHandler, metadata, parseContext);
return writer.toString();
Then implement an annotation and its processor:
@Target({ ElementType.METHOD, ElementType.FIELD })
@PropertyMapping(processor = @PropertyMappingAnnotationProcessorRef(
type = TikaField.Processor.class
public @interface TikaField {
String name() default "";
ContainerExtraction extraction() default @ContainerExtraction();
@Target({ ElementType.METHOD, ElementType.FIELD })
@interface List {
TikaField[] value();
class Processor implements PropertyMappingAnnotationProcessor<TikaField> {
public void process(PropertyMappingStep mapping, TikaField annotation,
PropertyMappingAnnotationProcessorContext context) {
TikaBridge bridge = new TikaBridge();
mapping.genericField( ? null :
Then just use it on your model:
public class MyEntity {
// ...
String myDocument;
Should you need any parameters, you can add them to your annotation and pass them along to your bridge's constructor.
If you need to populate multiple fields from a single PDF/Word document, for example to index metadata as well as the document content, then you will have to implement a PropertyBridge instead: it allows populating multiple fields instead of just one. That's a bit more complicated, but similar.