Search code examples
solrmultilingualdataimporthandler

Solr: DIH for multilingual index & multiValued field?


I have a MySQL table:

CREATE TABLE documents (
    id INT NOT NULL AUTO_INCREMENT,
    language_code CHAR(2),
    tags CHAR(30),
    text TEXT,
    PRIMARY KEY (id)
);

I have 2 questions about Solr DIH:

1) The langauge_code field indicates what language the text field is in. And depending on the language, I want to index text to different Solr fields.

# pseudo code

if langauge_code == "en":
    index "text" to Solr field "text_en"
elif langauge_code == "fr":
    index "text" to Solr field "text_fr"
elif langauge_code == "zh":
    index "text" to Solr field "text_zh"
...

Can DIH handle a usecase like this? How do I configure it to do so?

2) The tags field needs to be indexed into a Solr multiValued field. Multiple values are stored in a string, separated by a comma. For example, if tags contains the string "blue, green, yellow" then I want to index the 3 values "blue", "green", "yellow" into a Solr multiValued field.

How do I do that with DIH?

Thanks.


Solution

  • First your schema needs to allow it with something like this:

    <dynamicField name="text_*" type="string" indexed="true" stored="true" />
    

    Then in your DIH config something like this:

    <entity name="document" dataSource="ds1" transformer="script:ftextLang" query="SELECT * FROM documents" />
    

    With the script being defined just below the datasource:

    <script><![CDATA[
      function ftextLang(row){
         var name = row.get('language_code');
         var value = row.get('text');
         row.put('text_'+name, value); return row;
      }
    ]]></script>