I have some XML to ingest into Solr, which sounds like a use case that is intended to be solved by the DataImportHandler. What I want to do is pull the column name from one XML attribute and the value from another attribute. Here is an example of what I mean:
<document>
<data ref="reference.foo">
<value>bar</value>
</data>
</document>
From this xml snippet, I want to add a field with name reference.foo
and value bar
. The DataImportHandler includes a XPathEntityProcessor for processing XML documents. I've tried using it and it works perfectly if I give it a known column name (e.g, <field column="ref" xpath="/document/data/@ref">
) but have not been able to find any documentation or examples to suggest either how to do what I want, or that it cannot be done. So:
I haven't managed to find a way to do this without bringing in a transformer, but by using a simple ScriptTransformer
I worked it out. It goes something like this:
...
<script>
function makePair(row) {
var theKey = row.get("theKey");
var theValue = row.get("theValue");
row.put(theKey, theValue);
row.remove("theKey");
row.remove("theValue");
return row;
}
</script>
...
<entity name="..."
processor="XPathEntityProcessor"
transformer="script:makePair"
forEach="/document"
...>
<field column="theKey" xpath="/document/data/@ref" />
<field column="theValue" xpath="/document/data/value" />
</entity>
...
Hope that helps someone!
Note, if your dynamicField is multivalued, you have to iterate over theKey since row.get("theKey") will be a list.