Search code examples
javasolrsolrj

How does one write maintainable Solr code?


In our project we have a solr schema that has values with mutiple, near-duplicate fields. What I mean by this is we have an example field Field which we store as field, field_w and field_l in solr and each of them have different boost factors in search (the dynamic types are not _w or _l but similar).

As a result, we have a Model which we map to SolrSchemaModel through custom code, which we then store in Solr. When we read from Solr we then read SolrDocumentList (not SolrSchemaModel as it has embedded documents which are mapped to __childDocuments__ on read) and construct a ModelSearchResponse (not a Model as it has missing fields).

As you can see this is going to lead to a lot of maintenance whenever we want to add fields to Model and if we want to change the schema we also need to change SolrSchemaModel AND all the code mapping to and from it.

How have others handled persistence with Solr? One idea bouncing around was to have a JSON serialisation of the class as a Solr field, that way write is changed whenever the schema or Model changes and the serialisation/deserialisation remains intact. Another person suggested not using Solr as persistence only having something separate (which I guess would mean performing reads on another database after performing searches before returning results).

How have people solved this? Using Java 8 with SolrJ if that is relevant.


Solution

  • There is a couple of things here:

    1. If you are copying fields to other fields for different analysis, you don't need to store other fields, just index them. So, you only need to copyField them on Solr level and not change your serialization model.
    2. The classical way to deal with not needing to track schema equivalence is dynamic fields with prefix or suffix in the names indicating types. So, all *_s fields are strings and all *_d fields are dates. Your mapper could probably even suffix/unsuffix automatically. That's what most CMSs use when talking to Solr.
    3. copyField supports wildcards for source and target fields, so you can still combine the techniques above.