Search code examples
solrsolrcloud

Solr: Is it possible to change date format for a specific field using only the Schema API?


I would like to specify the date format dd/MM/yyyy for a field of type date. I know the following methods:

  1. edit schema.xml and add the datetimeformat="dd/MM/yyyy" attribute to the <field /> tag involved, but I haven't tested it. Or,
  2. edit solrconfig.xml and add a <str>dd/MM/yyyy</str> tag to the processor of class solr.ParseDateFieldUpdateProcessorFactory. I'm sure this works because I've personally tested it.

I would like to use the managed schema and the Schema API instead of editing schema.xml. This is handy and useful both in standalone and Cloud Solr.

In order to add a date field, I do as follows:

curl http://localhost:8983/solr/test/schema -X POST -H 'Content-type:application/json' --data-binary '
{   
  "add-field":
  {
    "name":"mydate",     
    "type":"date",
    "stored":true, 
    "indexed":true
  }
}'

and to edit some field properties, like the stored property, I do:

curl -X POST -H 'Content-type:application/json' --data-binary '
{
  "replace-field":
  {
    "name":"mydate",
    "stored":false
  }
}' http://localhost:8983/solr/test/schema

If I try to set "datetimeformat":"dd/MM/yyyy" during the creation or the edit of the fields, I get an error.

Is it possible to edit the date format using only the Schema API without editing any *.xml file?

UPDATE

I tried this command without any success:

curl http://localhost:8983/solr/test/config -H 'Content-type:application/json' -d '
{
  "update-updateprocessor" : 
  {
    "class": "solr.ParseDateFieldUpdateProcessorFactory", 
    "name":"solr.ParseDateFieldUpdateProcessorFactory",
    "format":["dd/MM/yyyy"]
  }
}'

The problem is that the original definition of solr.ParseDateFieldUpdateProcessorFactory in solrconfig.xml is:

<processor class="solr.ParseDateFieldUpdateProcessorFactory">
  <arr name="format">
    <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
    <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
    <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
    <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
    <str>yyyy-MM-dd'T'HH:mm:ssZ</str>
    <str>yyyy-MM-dd'T'HH:mm:ss</str>
    <str>yyyy-MM-dd'T'HH:mmZ</str>
    <str>yyyy-MM-dd'T'HH:mm</str>
    <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
    <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
    <str>yyyy-MM-dd HH:mm:ss.SSS</str>
    <str>yyyy-MM-dd HH:mm:ss,SSS</str>
    <str>yyyy-MM-dd HH:mm:ssZ</str>
    <str>yyyy-MM-dd HH:mm:ss</str>
    <str>yyyy-MM-dd HH:mmZ</str>
    <str>yyyy-MM-dd HH:mm</str>
    <str>yyyy-MM-dd</str>
  </arr>
</processor>

and it doesn't have a name attribute. If I omitt "name" attribute in the JSON request, Solr throws the error 'name' is a required field. I tried various combinations but none worked: "name":"solr.ParseDateFieldUpdateProcessorFactory", "name":"ParseDateFieldUpdateProcessorFactory", "name":"".

UPDATE 2

Running curl http://localhost:8983/solr/test/config returns a JSON object. Here's a portion of it:

{
...
    "updateRequestProcessorChain":[{
    "name":"add-unknown-fields-to-the-schema",
    "":[{"class":"solr.UUIDUpdateProcessorFactory"},
      {"class":"solr.LogUpdateProcessorFactory"},
      {"class":"solr.DistributedUpdateProcessorFactory"},
      {"class":"solr.RemoveBlankFieldUpdateProcessorFactory"},
      {
        "class":"solr.FieldNameMutatingUpdateProcessorFactory",
        "pattern":"[^\\w-\\.]",
        "replacement":"_"},
      {"class":"solr.ParseBooleanFieldUpdateProcessorFactory"},
      {"class":"solr.ParseLongFieldUpdateProcessorFactory"},
      {"class":"solr.ParseDoubleFieldUpdateProcessorFactory"},
      {"class":"solr.ParseDateFieldUpdateProcessorFactory"},
      {"class":"solr.AddSchemaFieldsUpdateProcessorFactory"},
      {"class":"solr.RunUpdateProcessorFactory"}]}],
...
}

This means that solr.ParseDateFieldUpdateProcessorFactory is a type of updateRequestProcessorChain. The documentation states:

The Config API does not let you create or edit <updateRequestProcessorChain> elements. However, it is possible to create <updateProcessor> entries and can use them by name to create a chain.

This means that it's not possible to add a specific date format to the existing solr.ParseDateFieldUpdateProcessorFactory using Config API. I should create a custom update processor that does what I want, and so use the add-updateprocessor API with proper parameters.


Solution

  • After struggling on the horrific Solr documentation, I found a solution. The documentation states:

    The Config API does not let you create or edit <updateRequestProcessorChain> elements. However, it is possible to create <updateProcessor> entries and can use them by name to create a chain.

    [ ... ]

    You can use this directly in your request by adding a parameter in the <updateRequestProcessorChain> for the specific update processor called processor=firstFld.

    This means that I have to add a custom update processor and invoke it explicitly when using the /update handler. So:

    curl http://localhost:8983/solr/test/config -H 'Content-type:application/json' -d '
    {
      "add-updateprocessor" : 
      {
        "name" : "myCustomDateUpdateProcessor", 
        "class": "solr.ParseDateFieldUpdateProcessorFactory", 
        "format":["dd/MM/yyyy"]
      }
    }'
    

    To load the data into test collection via the /update/csv handler, use this command:

    curl http://localhost:8983/solr/test/update/csv?processor=myCustomDateUpdateProcessor&commit=true --data-binary @file.csv -H 'Content-type:text/plain; charset=utf-8'
    

    Note the presence of processor=myCustomDateUpdateProcessor, where myCustomDateUpdateProcessor is the update processor I created before. The processor is stored in configoverlay.json and not in solrconfig.xml.