There is a table of n
products, and a table of features of these products. Each product has many features. Given a Solr DataImportHandler configuration:
<document name="products">
<entity name="item" query="select id, name from item">
<field column="ID" name="id" />
<field column="NAME" name="name" />
<entity name="feature"
query="select feature_name, description from feature where item_id='${item.ID}'">
<field name="feature_name" column="description" />
<field name="description" column="description" />
</entity>
</entity>
</document>
Solr will run n + 1
queries to fetch this data. 1
for the main query, n
for the queries to fetch the features. This is inefficient for large numbers of items. Is it possible to configure Solr such that it will run these queries separately and join them in-memory instead? All rows from both tables will be fetched.
This can be done using CachedSqlEntityProcessor:
<document name="products">
<entity name="item" query="select id, name from item">
<field column="ID" name="id" />
<field column="NAME" name="name" />
<entity name="feature"
query="select item_id, feature_name, description from feature"
cacheKey="item_id"
cacheLookup="item.ID"
processor="CachedSqlEntityProcessor">
<field name="feature_name" column="description" />
<field name="description" column="description" />
</entity>
</entity>
</document>
Since Solr's index is 'flat', feature_name
and description
are not connected in any way; each product
will have multi-valued fields for each of these.