Search code examples
solrdih

The simplest Solr DIH indexing


I'm trying to index data from a database in Solr using the DIH.

So I have modified the two config files as follows:

solrconfig.xml :

<requestHandler name="/dataimport" 
    class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
    <str name="config">data-config.xml</str>
</lst>
</requestHandler>

data-config.xml :

<dataConfig>
    <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="****"/>
    <document>
        <entity name="source_scellee" query="select * from source_scellee">
        </entity>
    </document>
</dataConfig>

source_scellee being the name of my table on my test database. It contains many fields.

Obviously, I'm trying to run nothing else than a simple test. When running http://localhost:8983/solr/dataimport?command=full-import&clean=false&commit=true I get the following result :

<str name="Full Dump Started">2012-01-27 12:27:01</str><str name="">Indexing completed. Added/Updated: 4 documents. Deleted 0 documents.</str><str name="Committed">2012-01-27 12:27:02</str>
<str name="**Total Documents Failed**">4</str>

Besides no warning nor error on the server logs. 4 is my number of records inside table "source_scellee". But it says all documents fail.

If I run a query from http://localhost:8983/solr/admin/ no results appear, at all !! How can I solve it ? (":" shows no results)

Thank you for your help!!!

----edit--- I have added these lines to my schema.xml :

<field name="ID" type="int" indexed="true" stored="true" />
    <field name="reference_catalogue"  type="string" indexed="true" stored="true"/>
    <field name="reference_capsule"  type="string" indexed="true" stored="true"/>
    <field name="organisme_certificateur"  type="string" indexed="true" stored="true" />
    <field name="reference_certificat"  type="string" indexed="true" stored="true" />
    <field name="duree_d_utilisation"  type="string" indexed="true" stored="true" />
    <field name="activite_nominale"   type="string" indexed="true" stored="true"/>
    <field name="activite_minimale"   type="string" indexed="true" stored="true"/>
    <field name="activite_maximale"   type="string" indexed="true" stored="true"/>
    <field name="coffret"  type="boolean" indexed="true" stored="true"/>
    <field name="dispositif_medical"  type="boolean" indexed="true" stored="true"/>
    <field name="forme_speciale" type="boolean" indexed="true" stored="true" />
    <field name="exemption_cpa"  type="boolean" indexed="true" stored="true"/>
    <field name="marquage_ce"  type="boolean" indexed="true" stored="true"/>
    <field name="element_cible"  type="boolean" indexed="true" stored="true"/>

However the result is still the same: no results when querying (I tried to restart solr, and to re-index all also)

------second edit--- I have tried the dynamic import Now my data-config.xml looks like this :

<document>
       <entity name="source_scellee" query="select * from source_scellee">
            <field column="ID" name="ID_i" />
            <field column="reference_catalogue" name="reference_catalogue_s" />
            <field column="reference_capsule" name="reference_capsule_s" />
            <field column="organisme_certificateur" name="organisme_certificateur_s" />
            <field column="reference_certificat" name="reference_certificat_s" />
            <field column="duree_d_utilisation" name="duree_d_utilisation_s" />
            <field column="activite_nominale" name="activite_nominale_s" />
            <field column="activite_minimale" name="activite_minimale_s" />
            <field column="activite_maximale" name="activite_maximale_s" />
            <field column="coffret" name="coffret_b" />
            <field column="dispositif_medical" name="dispositif_medical_b" />
            <field column="forme_speciale" name="forme_speciale_b" />
            <field column="exemption_cpa" name="exemption_cpa_b" />
            <field column="marquage_ce" name="marquage_ce_b" />
            <field column="element_cible" name="element_cible_b" />
        </entity>
    </document>

Solution

  • 1.) You can take a look to the statistics page to see, how much docs are indexed right now: http://localhost:8983/solr/admin/stats.jsp

    2.) The result of your search depends on your schema.xml, because there it's defined how docs are indexed/stored, which fields are processed and how searchs are handled on query time. Please take a look at this file or post the field definition from the schema.xml and also the schema/design from your table source_scellee. Does the columns and the fields have the same name?

    //Edit: This should work, if coulmname and filedname are the same:

    <document>
           <entity name="source_scellee" 
                   pk="ID"
                   query="select * from source_scellee">
    
            </entity>
        </document>
    

    is having NULL values in data an issue ?

    that depends on the destination field.

    Are your running solr in an tomcat or someting like that? Take a look in the Java EE Container output, like catalina.out or so.