Search code examples
solrdataimporthandlerdih

What is the difference between a Join Query and Embedded Entities in Solr DIH?


I am trying to index data across multiple tables using Solr's Data Import Handler. The official wiki on the DIH suggests using embedded entities to link multiple tables like so:

<document>
    <entity name="item" pk="id" query="SELECT * FROM item">
        <entity name="member" pk="memberid" query="SELECT * FROM member WHERE memberid='${item.memberid}'>
        </entity>
    </entity>
</document>

Another way that works is:

<document>
    <entity name="item" pk="id" query="SELECT * FROM item INNER JOIN member ON item.memberid=member.memberid">
    </entity>
</document>

Are these two methods functionally different? Is there a performance difference? My guess would be that the first method is to support non-SQL tables, but I'm not sure.

Another though would be that, if using join tables in MySQL, using the SQL query method with multiple joins could cause multiple documents to be indexed instead of one.


Solution

  • Few things that I encountered :-

    • If you have a one to one mapping you can use the join so that you get all the fields with one query itself.
    • If you have multiple records for the root you would use the sub entity which would probably create a multivalued field.
    • Sub entities fire a query for each of the records and hence are slower in performance.

    Would like to hear from Other users as well.