I am trying to fill a Solr index from 2 different data-sources (xml and db) using the DataImportHandler.
1st try: Created 2 data-config.xml files, one for the xml import and one for the db import.
The db-config would read id
and lets say field A
. The xml-config also id
and field B
.
That works for both (i could import from both datasources), but the index got overwritten each time (with clean=false
of course), so I either had id
and A
or id
and B
so on for the 2nd try: merged the 2 files into one
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource
name="cr-db"
jndiName="xyz"
type="JdbcDataSource" />
<dataSource
name="cr-xml"
type="FileDataSource"
encoding="utf-8" />
<document name="doc">
<entity
dataSource="cr-xml"
name="f"
processor="FileListEntityProcessor"
baseDir="/path/to/xml"
filename="*.xml"
recursive="true"
rootEntity="false"
onError="skip">
<entity
name="xml-data"
dataSource="cr-xml"
processor="XPathEntityProcessor"
forEach="/root"
url="${f.fileAbsolutePath}"
transformer="DateFormatTransformer"
onError="skip">
<field column="id" xpath="/root/id" />
<field column="A" xpath="/root/a" />
</entity>
<entity
name="db-data"
dataSource="cr-db"
query="
SELECT
id, b
FROM
a_table
WHERE
id = '${f.file}'">
<field column="B" name="b" />
</entity>
</entity>
</document>
</dataConfig>
A bit funny is the id = '${f.file}'
-part i guess, but that is the id that is used. The select statement is correctly formed, but I get an exception when trying to run that file in the dataimport.jsp
. The first part (xml) works fine, but when he gets to the db part it raises:
java.lang.RuntimeException: java.io.FileNotFoundException:
Could not find file: SELECT id, b FROM a_table WHERE id = '12345678.xml'
at org.apache.solr.handler.dataimport.FileDataSource.getFile[..]
Any advice? Thanks in advance
EDIT
I found the problem for the FileNotFoundException: within the entity tags the datasource
-attributes need to be camelCased --> dataSource
..
Now it runs through, but with the same outcome as in the first try: only field B
gets in the index. If I take the db-entity out, then the file contents are indexed (field A
)
Try:
<entity name="db-data" dataSource="cr-db"
The attributes are case-sensitive, so your wrong-cased attribute name is ignored and you fall back to the default one (which somehow is the file one).