Search code examples
solrdih

SOLR: Inconsistencies using splitBy to populate a multi valued field


I'm having trouble using the splitBy functionality to populate a multi valued field from a pipe delimited datasource. My implementation seems to partially work for one of the field and does not the work for the the other field. An example of my implementation below.

I have a db view with following data:

recordId relist dbaName

1 PA21|MD29 The Hong Kong Dragon|The Peeled Apple

My config:

<dataConfig>
    <dataSource name="jdbc" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@myserver:1521:XE" user="MyUser" password="MyPassword"/>
    <document>
        <entity name="mentity" query="select * from MySampleView" transformer="RegexTransformer" >
            <field sourceColName="relist" column="relist" splitBy="\|"  />               
            <field sourceColName="dbaName" column="dbaName" splitBy="\|"  />
            <field column="recordId" name="recordId" />
        </entity>  
    </document>
</dataConfig>

My schema.xml snippet:

<fields>    
    <field name="relist"      type="string"    indexed="true"  stored="true"  multiValued="true" /> 
    <field name="dbaName"      type="string"    indexed="true"  stored="true"  multiValued="true" />
    <field name="recordId"        type="string"    indexed="true"  stored="true"  multiValued="false" />
</fields>

<uniqueKey>recordId</uniqueKey>

My expectation is that when the data is split and stored, that the document would look something like this:

"docs": [ 
  { 
    "relist": [ 
      "PA21", 
      "MD29" 
    ], 
    "recordId": "1", 
    "dbaName": [ 
      "The Hong Kong Dragon", 
      "The Peeled Apple" 
    ] 
  } 
] 

However, this is what I get:

"docs": [ 
  { 
    "relist": [ 
      "PA21", 
      "MD29", 
      "PA21|MD29" 
    ], 
    "recordId": "1", 
    "dbaName": [ 
      "The Hong Kong Dragon|The Peeled Apple" 
    ] 
  } 
] 

My issue:

  1. relist is being split but it also includes the original delimited value in the list
  2. dbaName is not being split

Any help would be appreciated.

Thanks


Solution

  • I recently ran into this same issue, and discovered that if your field name has mixedCase, like dbaName, then splitby does not work without specifying the sourceColName. You can fix this by using a temporary name in your SQL query, like so:

        <entity name="mentity" query="select dbaName as dba_name from MySampleView" transformer="RegexTransformer" >
            <field sourceColName="dba_name" column="dbaName" splitBy="\|"  />
        </entity>  
    

    The other option would be to call your database column dba_name and your field name dba_name. Then splitBy works just fine, without specifying sourceColName.

    I found this quite confusing, so here is one more example, just to make it crystal clear to every one

    This WILL NOT work:

    select 'foo|bar' as genRes
    <field column="genres" sourceColName="genRes" splitBy="\|" />
    

    This WILL work

    select 'foo|bar' as genres
    <field column="genres" sourceColName="genres" splitBy="\|" />