Search code examples
searchsolrindexingschemaodt

Indexing and accessing odt files in solr


How can I post, index and search for content within an odt file stored in my solr_home directory?

I have tried understanding and applying the below mentioned pages and have included a body field in the schema:

Indexing text and html files

Simple Post Tool -Confluence

The resourcename field contains the file location but content field is blank. But i am still not able to search the file contents even though it shows that the file is indexed and the changes are committed. Is there any end to end documentation for such a requirement. I am using solr with Tomcat on a linux machine. I'm a newbie at solr and might be missing out details not mentioned in the above pages.


Solution

  • Use Apache tika to extract content and send it to SOLR

    Tika tika = new Tika();
    InputStream fileInputStream = new FileInputStream("d:\\fileName.odt");
    Metadata metadata = new Metadata();
    metadata.set(Metadata.RESOURCE_NAME_KEY, "fileName.odt");
    
    String content = tika.parseToString(fileInputStream, metadata);
    

    Alternatively you can also use ExtractingRequestHandler