Search code examples

Apache Solr still keep old data after delta import

I am using solr 7.6.

I do full import from mysql, the table customer looks like:

customer_id pk   int
customer_code    varchar
name             varchar
update_datetime  timestamp

I modify one record, change

customer_id    customer_code    name

46027          C1               zxc


customer_id    customer_code    name

46027          C1               789

then I do delta import with data-config looks like


  <dataSource type="JdbcDataSource" driver="com.mysql.cj.jdbc.Driver"
    url="jdbc:mysql://localhost:3306/test" user="test" password="123456"/>

    <entity name="customer" pk="customer_id"
            query="select customer_id, customer_code, name from customer"
            deltaImportQuery="select customer_id, customer_code, name from customer where customer_id='${}'"
            deltaQuery="select customer_id from customer where update_datetime &gt; '${dih.last_index_time}'"

delta import is success, solr can return the new result with query name:789.

However, when I query with old data name:zxc, it still can return old data:


Why? and how can I make solr delete the old data if this record has been updated?

customer_id is the primary key whose type is int in MySql.

I added customer_id and name to schema of Solr and set customer_id as pint.

Next screenshot is the schema tag of Solr, it says the unique key field is id.

enter image description here

---------------- UPDATE -------------------

the managed-schema.xml is:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Solr managed schema - automatically generated - DO NOT EDIT -->
<schema name="default-config" version="1.6">
  <fieldType name="ancestor_path" class="solr.TextField">
    <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
    <analyzer type="query">
      <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/"/>
  <fieldType name="binary" class="solr.BinaryField"/>
  <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
  <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>

  <!-- field : delimited_payloads_float, delimited_payloads_int, 
  delimited_payloads_string, descendent_path, location, location_rpt,

  <!-- field starts with p, e.g. pdate -->

  <fieldType name="random" class="solr.RandomSortField" indexed="true"/>
  <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>
  <fieldType name="strings" class="solr.StrField" sortMissingLast="true" docValues="true" multiValued="true"/>

  <!-- field name starts with text_-->

  <field name="customer_id" type="pint" uninvertible="true" indexed="true" stored="true"/>
  <field name="name" type="text_en" uninvertible="true" indexed="true" stored="true"/>
  <field name="_root_" type="string" docValues="false" indexed="true" stored="false"/>
  <field name="_text_" type="text_general" multiValued="true" indexed="true" stored="false"/>
  <field name="_version_" type="plong" indexed="false" stored="false"/>
  <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>

  <!-- default dynamic fields -->


  • Since you don't have a value for the id field, Solr is generating a unique one for you. You'll have to either include an id that is actually the unique id for the document you're submitting, or change the uniqueKey definition - I suggest doing the first, as it can then easily be changed later if necessary.

    If customer_id uniquely identifies the document, add customer_id AS id, .. to your SQL SELECT statements:

    SELECT customer_id AS id, customer_id, customer_code, name FROM customer