I am working on making a book catalog searchable using Solr. I have written a query that gets all of the info that I am interested in using DataImportHandler. Every book may have multiple formats, and each format has its own ISBN, format name, and price, which are expressed as comma-separated values, as follows:
| id | title | isbns | prices | formats |
|-------------------------------------------------------------|
| 1 | A Book | isbn1,isbn2 | price1,price2 | fmt1,fmt2 |
| 2 | Another | anisbn | aprice | aformat |
... ... ...
I am currently using a RegexTransformer and splitBy so that I can make isbns, prices, and formats multiValued fields for faceting. HOWEVER, I would ideally like to be able to pull out the values individually and store them in another field in the index. In other words, for the book with the Id 1 in the example, I would like to store the following fields as strings:
Field 1: "fmt1 (isbn1): price1"
Field 2: "fmt2 (isbn2): price2"
Is this sort of thing possible with Solr? I could always pull out the fields and process them on the application side, but since this Solr index will have multiple clients performing queries, I would rather store the extra values at the time I build the index.
It's all explained in DIH wiki, simply use groupNames
param to specify field names (groups are regular regex groups).
EDIT:
groupNames : A comma separated list of field column names, used where the regex contains groups and each group is to be saved to a different field. If some groups are not to be named leave a space between commas.
In this example the attributes 'regex' and 'sourceColName' are custom attributes used by the transformer. It reads the field 'full_name' from the resultset and transforms it to two new target fields 'firstName' and 'lastName'. So even though the query returned only one column 'full_name' in the resultset the solr document gets two extra fields 'firstName' and 'lastName' which are 'derived' fields. These new fields are only created if the regexp matches.