We have a requirement where both structured and unstructured data comes into the system. We need to index both of them and then enable search functionality on it. We are using SolrCloud on Hadoop platform. For structured data, we are planning to put the data into HBase and for unstructured, directly into HDFS.
My question is how to index these sources under a single Solr core? Would that be possible to index both structured and unstructured data under a single core/collection in SolrCloud and then enable search functionality over that index?
Thanks in advance.
You can at best have a Solr schema that contains all the field names possible i.e. for both your structured and unstructured data. Also note that since, you mentioned unstructured you can append more field names to the existing schema file. If you cannot add fields then you need to think of some other way to make it possible.
Thus, for your structured data you need to only populate values of fields specific to your structured data and leave the rest of the fields untouched.
For a single core and index in Solr, the above stated is actually how you can have different documents having different structures.
Please get back to me if you meant something different in the question.