Search code examples
jsonindexingsolr

Solr. When indexing custom json, many fields of the same name are stored in one field



I am trying to create an index from a json file using Solr 8.11.
Here is the content of my json file:
{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": "55236",
    "cards": [
        {
            "title": "hood",
            "title_index": "hood",
            "text": "<div class=m-l-15>definition</div> ",
            "text_index": "definition"
        },
        {
            "title": "'s Gravenhage",
            "title_index": "'s Gravenhage",
            "text": "<div class=m-l-15>definition</div> ",
            "text_index": "definition"
        },
        {
            "title": "'tween",
            "title_index": "'tween",
            "text": "<div class=m-l-15>definition</div> ",
            "text_index": "definition"
        }
    ]
}

I expect to receive the following:

{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": "hood",
    "text": "<div class=m-l-15>definition</div> ",
},
{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": "'s Gravenhage",
    "text": "<div class=m-l-15>definition</div> ",
},
{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": "'tween",
    "text": "<div class=m-l-15>definition</div> ",
}

But I get this:

{
    "dict": "TEST En-En",
    "index_language": "En",
    "contents_language": "En",
    "lang": "En-En",
    "type": "explanatory",
    "words_count": 55236,
    "title": [
        "'hood",
        "'s Gravenhage",
        "'tween"
    ],
    "text": [
        "<div class=m-l-15>definition</div> ",
        "<div class=m-l-15>definition</div> ",
        "<div class=m-l-15>definition</div> "
    ]
}

That is, the title field from all documents is stored in one multi-valued title field.
Here is the schema:

  <field name="id" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="dict" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="index_language" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="contents_language" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="lang" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="type" type="string" multiValued="false" indexed="true" stored="true"/>
  <field name="words_count" type="tint"/>
  <field name="text" type="text_general"/>
  <field name="title" type="text_general"/>
  <field name="text_index" type="text_general" indexed="true" stored="false"/>
  <field name="title_index" type="text_general" indexed="true" stored="false"/>

This is the request:

path=/update/json/docs params={?split=/cards
&commitWithin=1000
&f=dict:/dict
&f=index_language:/index_language
&f=contents_language:/contents_language
&f=lang:/lang
&f=type:/type
&f=words_count:/words_count
&f=title:/cards/title
&f=title_index:/cards/title_index
&f=text:/cards/text
&f=text_index:/cards/text_index
+-H+'Content-type:application/json'
&overwrite=true
&wt=json}

According to the documentation, I should get what I expect.
Please tell me what am I doing wrong.


Solution

  • You have an additional ? in front of the split parameter, effectively making it not work - since it gets a parameter named ?split and not split. Remove the additional ? and it should work.