Search code examples
solrsolr4dspace

Solr Change string field to integer multivalued


Background: I executed sharding by year in my statistics solr core, using a dspace command:

[dspace]/bin/dspace stats-util -s

According to: https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-SolrShardingByYear

After that, there are several cores, divided by year: statistics, statistics-2015, statistics-2014, and so on.

However, multivalued fields are incorrect now, they seem to be a string:

"owningComm": [
      "8,2,1,2,1,1"
]

When we try to query, for example, owningComm:1 no results are given.

The correct behaviour, before sharding, was as an "array" of integers:

"owningComm": [
      5,
      2,
      1,
      2,
      1,
      1
]

The field in schema.xml of Solr 4 is:

<field name="owningComm" type="integer" 
       indexed="true" stored="true" 
       required="false" multiValued="true" />`

I already tried to tokenize the string with commas, but without success.

Is there any way to update this field to integers again? Removing the quotes or something like that?

We have millions docs stored.


Solution

  • I took a look at my some of my shard data, and I see the same result that you have reported. Interestingly, after the upgrade to either DSpace 4 or DSpace 5, I remember that I was unable to search by owningComm. I had presumed that this field had been dropped. Now I suspect that this issue you have reported was the underlying cause.

    I recommend reporting this issue as a DSpace bug: https://jira.duraspace.org/projects/DS/issues