Search code examples
solrdataimporthandlersolr6

SOLR data import handler skips or ignores request


I am using SOLR 6.0.0 and I use data import handler to handle indexing from MySQL to SOLR.

I have below query in my db-data-config.xml file

<entity name="user" query="SELECT ID, A, B, C FROM `USER` U WHERE U.ID = '${dataimporter.request.id}' OR '' = '${dataimporter.request.id}'">
    <field column="A" name="A" ....
    .......
</entity>

Basically if i pass the id, it will index only that ID, otherwise it will index whole table.

Now Issue is,

I have a frequent insert update.

After first insert I send request to index of single id/doc/row, just after few millisecond the record gets updated so I send again the request to index same id. Now it seems SOLR skips the second request as I can see that both request went but the updated data is not present in the SOLR.

This happens with non unique items too (not same id). I request two or three data import at the same time and solr skips/ignores second request.

I request it with http method, sending following query,

http://localhost:8983/solr/user/dataimport?command=full-import&verbose=false&clean=false&commit=true&optimize=false&core=user&id=1

Solution

  • If a DIH handler is busy running a request, will ignore any additional request you send to it.

    So you have to review your approach regarding how/when to invoke DIH, some ideas:

    • as you say you have a high edit frequency, reindexing by id does not seem the best thing, somethign time based seems more scalable. You could add a 'lastUpdated' column (populated via trigger wheneve the row is created/updated), and then invoke the reindex every X (1 min, 5min...whatever you can afford). If one request is ignored no data is lost, the rows that should have been reindexed will be reindexed on next delta that runs.
    • if you want to keep your id based approach you need to:\
      • wait till the previous DIH request is done before you send a new one
      • allow for some buffer where you can keep adding ids while you wait
      • allow for multiple ids in you dih config