Search code examples
solrdataimporthandler

SOLR 7 DataImportHandler – Unable to get data into multivalued fields (via "Joined" table)


I'm attempting to use DIH to load our SOLR data.

I've done this on other SOLR cores/installations without issue, but for some reason I can't get it working on this installation.

The main data (media - mostly videos) (from the primary DIH query) loads just fine. The secondary query (a nested entity) that should be loading one-to-many records for video tags, does not populate the multivalued fields in SOLR. I don't see any error messages (at least not that I can find in the logs or anywhere else), so I'm not sure where it's going wrong.

The one thing I wonder about is that the "join" (the where clause in the nested query) does not use the primary key field - but I don't think this should matter. Please correct me if I'm wrong here.

Here's a simplified copy of the DIH config:

<dataConfig>
    <dataSource type="JdbcDataSource"
            driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
            url="jdbc:sqlserver://x.x.x.x;databaseName=media;"
            user="xxx"
            password="xxx"
            batchSize="100" />

    <document name="mediaContent">
        <entity name="media" query="SELECT * FROM mediaContent" pk="mediaID">
          <field column="mediaID" name="mediaID" />
          <field column="mediaType" name="mediaType" />
          <field column="videoID" name="videoID" />
          <field column="videoTitle" name="videoTitle" />
          <field column="videoDescription" name="videoDescription" />
          <field column="videoStatusID" name="videoStatusID" />
          <field column="videoPublished" name="videoPublished" />

          <entity name="videoTags"
            query="
                SELECT tagID, tagTitle
                FROM videoTags
                WHERE videoID = '${mediaContent.videoID}'">
            <field column="tagID" name="videoTagIDs" />
            <field column="tagTitle" name="videoTagTitles" />
          </entity>
        </entity>
    </document>
</dataConfig>

The multivalued items in the managed-schema file are configured as per:

<field name="videoTagIDs" type="int" indexed="true" stored="true" required="false" multiValued="true" />
<field name="videoTagTitles" type="text" indexed="true" stored="true" required="false" multiValued="true" />

Normally I would use the primary key to join the data in the second query, but in this case, because not all of the content is videos, and the tags only related to the video content, I am not using the PK field. Instead, I'm using the videoID field. Again, I'm not sure if that matters here or not. I get the proper data when I run the queries in SQL.

If anyone has any suggestions as to how I can debug what's happening with the secondary query, or better yet, if anyone sees something in my config that I've done wrong, I'd appreciate your feedback.

Thanks!

Bill


Solution

  • So in the end this turned out to be the fault of case sensitivity. The fields coming from the SQL query did not exactly match the case configured in the DIH field names (videoID vs videoId). While this did not matter for the main query, and the data was imported there, it did matter for the second nested entity.

    The debugging never worked well, but debugging the queries actually being run on SQL server was helpful in seeing what was going on.