I'm using Solr and Cassandra (via DSE). Here is one entry (row) of data in Cassandra:
ORDER_INFO_CF
-orderHistoryID=1000072459
-SPECIAL_COLUMN_KEY=0800000002||1294034400000|113942
I can index the Cassandra data without an issue, with this schema.xml:
<schema name="ORDER_INFO_CF" version="1.1">
<types>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text" class="solr.TextField">
<analyzer><tokenizer class="solr.WikipediaTokenizerFactory"/></analyzer>
</fieldType>
</types>
<fields>
<field name="orderHistoryID" type="string" indexed="true" stored="true"/>
<field name="SPECIAL_COLUMN_KEY" type="text" indexed="true" stored="true"/>
</fields>
Of course, having all the data lumped into one pipe-delimited string doesn't help very much. So I tried to split it using the PatternTokenizerFactory, like this (schema.xml):
<schema name="ORDER_INFO_CF" version="1.1">
<types>
<fieldType name="string" class="solr.StrField" />
<fieldType name="splitField" class="solr.TextField">
<analyzer><tokenizer class="solr.PatternTokenizerFactory" pattern="|" /></analyzer>
</fieldType>
</types>
<fields>
<field name="orderHistoryID" type="string" indexed="true" stored="true"/>
<field name="AccountNumber" type="splitField" indexed="true" stored="true"/>
<field name="ActionFlag" type="splitField" indexed="false" stored="true"/>
<field name="CreatedDate" type="splitField" indexed="true" stored="true"/>
<field name="CreatedTime" type="splitField" indexed="true" stored="true"/>
</fields>
orderHistoryID is still being mapped, but the SPECIAL_COLUMN_KEY
value is not being split into the four fields described above. I'm sure that I'm just not doing something quite right with the PatternTokenizerFactory
. I've also looked at the DataImportHandler RegexTransformer
, but that only seems to works with RDBMS and XML imports.
Essentially, my data maps like this in Solr:
orderHistoryID=1000072459
SPECIAL_COLUMN_KEY=0800000002||1294034400000|113942
And I'm trying to get it to map like this:
orderHistoryID=1000072459
AccountNumber=0800000002
ActionFlag=
CreatedDate=1294034400000
CreatedTime=113942
Could someone please point me in the right direction?
An easier way to solve this problem would be to use Solrj . Assuming that you already have an api to read records from cassandra, you will be able to feed it to solr using Solrj.
The other way would be to create a custom POJO and then use . For example -
import org.apache.solr.client.solrj.beans.Field;
public class CustomRecord {
@Field
private String orderHistoryID;
@Field
private String AccountNumber;
@Field
private String ActionFlag;
@Field
private String CreatedDate;
@Field
private String CreatedTime;
}
and then use
SolrServer server = new HttpSolrServer("http://HOST:8983/solr/");
server.addBean(customRecord);
For more details, refer to directly adding pojos to solr.