Search code examples
mysqlelasticsearchelasticsearch-jdbc-river

Elasticsearch: Remove Duplicate Records from Index Documents


Here is my JDBC river commands for fetching all records from database.

localhost:9200/_river/my_update_river/_meta
{
  "type" : "jdbc",
   "jdbc" : {
     "url" : "jdbc:mysql://localhost:3306/admin",
      "user" : "root",
      "password" : "",
      "poll" : "6s",
      "index" : "updateauto",
      "type" : "users",
      "schedule":"0/10 * * ? * *",
      "strategy" : "simple",
      "sql" : "select * from users"
    }
 }

When I run this command: I have two problems:

  1. Duplicate Records
  2. And when I add new records in database its not updating index documents but search it by

    { "query": { "filtered": { "filter": { "term": { "Name": "testing" } } } } }

Here's my result.

   {
     "took" : 4,
     "timed_out" : false,
      "_shards" : {
      "total" : 5,
      "successful" : 5,
      "failed" : 0
   },
     "hits" : {
     "total" : 37551,
      "max_score" : 1.0,
      "hits" : [ {
      "_index" : "updateauto",
      "_type" : "users",
      "_id" : "AUvjnNHmMKBTPrby96Jg",
      "_score" : 1.0,
      "_source":{"ID":23,"Name":"Abudul  Rafay","Email":"a","Password":"afasd"}
}, {
      "_index" : "updateauto",
     "_type" : "users",
     "_id" : "AUvjnNHnMKBTPrby96Jk",
    "_score" : 1.0,
     "_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
      "_index" : "updateauto",
      "_type" : "users",
       "_id" : "AUvjngk0MKBTPrby96Ka",
      "_score" : 1.0,
      "_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
     "_index" : "updateauto",
     "_type" : "users",
     "_id" : "AUvjngk0MKBTPrby96Kf",
     " _score" : 1.0,
     "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
      "_index" : "updateauto",
      "_type" : "users",
     "_id" : "AUvjnjA0MKBTPrby96Kh",
     "_score" : 1.0,
     "_source":{"ID":23,"Name":"Abudul Rafay","Email":"a","Password":"afasd"}
}, {
     "_index" : "updateauto",
      "_type" : "users",
    "_id" : "AUvjnjA0MKBTPrby96Km",
    "_score" : 1.0,
    "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
},  {
    "_index" : "updateauto",
    "_type" : "users",
    "_id" : "AUvjnZP0MKBTPrby96KD",
    "_score" : 1.0,
    "_source":{"ID":24,"Name":"rafay","Email":"hello","Password":"fasfas"}
}, {
    "_index" : "updateauto",
    "_type" : "users",
    "_id" : "AUvjnPe-MKBTPrby96Jq",
   "_score" : 1.0,
    "_source":{"ID":25,"Name":"r rafay ","Email":"r rafay","Password":"r rafay"}
}, {
    "_index" : "updateauto",
    "_type" : "users",
   "_id" : "AUvjnR7NMKBTPrby96Ju",
    "_score" : 1.0,
    "_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
}, {
    "_index" : "updateauto",
    "_type" : "users",
    "_id" : "AUvjnbuLMKBTPrby96KO",
    "_score" : 1.0,
    "_source":{"ID":26,"Name":"New User","Email":"New","Password":"new"}
    } ]
   }
 }

I want result without duplicate records and also auto updated.


Solution

  • I didn't quite catch your second question but considering the duplicate issue here is what you need to do :

    You'll need to specify the id of document from within the river definition as followed :

    "sql" : "select *, ID as _id from user"
    

    This way, the river will just write each user concedering it's id.