Search code examples
solrapache-tikasolr-cell

NoClassDefFoundError MimeTypeException with PDF extraction


I am getting an exception trying to use update/extract with PDF files

My Set up is:- Ubuntu Server 11.10 Tomcat 6 Solr 3.5.0.2011.11.22.15.54.38

I can browse to solr/admin OK

I have put all the contrib/extract and apache-solr-cell3.5.0.jar libraries into the tomcat folder webapps/solr/WEB-INF/lib

I am calling extract using:-

curl "http://localhost:8080/solr/update/extract?uprefix=attr_&fmap.content=attr_content&commit=true" -F "file=/path/to/my.pdf"

error is

java.lang.NoClassDefFoundError: org/apache/tika/mime/MimeTypeException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:383)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:461)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:248)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)

Would appreciate any pointers - the only time this error seems to come up elsewhere is with Nutch and cached results.

I have tried sending the mimetype in the querystring and also a *.doc file but got the same error.


Solution

  • This was due to the basic error of copying the necessary tika libraries (to tomcat6/webapps/solr/WEB-INF/lib) but leaving ownership of the jar files as ROOT instead of chown-ing them to TOMCAT6. After setting the right permission and restarting Tomcat it started working OK