Search code examples
solr

Solr index custom file types


Basically, I am a Solr newbie and have had 0 experience with this as our Solr expert left the company. We are receiving a file from a client that is a proprietary file. I don't have access to the application in which it was generated from.

When uploading to Solr we receive the following error

SOLR Log
solr-cloud.log: {"msg":"2022-01-19 08:10:06.915 ERROR (qtp349420578-3516) [c:<collection> s:shard2 r:core_node5 x:<redacted>] o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: ucar/nc2/NetcdfFile"}

Our App logging
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/<collection>: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /solr/<collection>/update/extract. Reason:
<pre>    Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.NoClassDefFoundError: ucar/nc2/NetcdfFile
        at org.apache.tika.parser.hdf.HDFParser.parse(HDFParser.java:88)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

Other normal file types works (e.g. doc, pdf, zip)

  1. I cannot open or edit the file to see what fields are in there to index so is there a way to be able to index this?
  2. If not, is there anything else I can do to handle this file type

TIA


Solution

  • file is being parsed by Solr/Tika using an HDF parser which in turn depends on NetCDF parser - https://www.unidata.ucar.edu/downloads/netcdf-java/