Search code examples
solrdataimporthandler

Solr DataImportHandler: Can I get a dynamic field name from xml attribute with XPathEntityProcessor?


I have some XML to ingest into Solr, which sounds like a use case that is intended to be solved by the DataImportHandler. What I want to do is pull the column name from one XML attribute and the value from another attribute. Here is an example of what I mean:

<document>
  <data ref="reference.foo">
    <value>bar</value>
  </data>
</document>

From this xml snippet, I want to add a field with name reference.foo and value bar. The DataImportHandler includes a XPathEntityProcessor for processing XML documents. I've tried using it and it works perfectly if I give it a known column name (e.g, <field column="ref" xpath="/document/data/@ref">) but have not been able to find any documentation or examples to suggest either how to do what I want, or that it cannot be done. So:

  • Can I do this using XPathEntityProcessor? If so, how?
  • If not, can I do this some other way with DataImportHandler?
  • Or am I left with writing my own import handler?

Solution

  • I haven't managed to find a way to do this without bringing in a transformer, but by using a simple ScriptTransformer I worked it out. It goes something like this:

    ...
    <script>
    function makePair(row) {
      var theKey = row.get("theKey");
      var theValue = row.get("theValue");
    
      row.put(theKey, theValue);
      row.remove("theKey");
      row.remove("theValue");
    
      return row;
    }
    </script>
    
    ...
    
    <entity name="..." 
      processor="XPathEntityProcessor" 
      transformer="script:makePair"
      forEach="/document"
      ...>
    
      <field column="theKey" xpath="/document/data/@ref" />
      <field column="theValue" xpath="/document/data/value" />
    </entity>
    ...
    

    Hope that helps someone!

    Note, if your dynamicField is multivalued, you have to iterate over theKey since row.get("theKey") will be a list.