Search code examples
djangoelasticsearchdatabase-migration

What will be the best practice to sync mysql and elasticsearch?


I am mysql as my primary data store and was performing full index seraches through it only.Now my dataset is having 1M records.I want to use elastic search for searching.Problem is how shoulld i go about migrating data?

In the real time scenario should i use async tasks(celery as i m having my app on django) to insert into es after my mysql has handled creation,updation or deletion.Or should i have a script which runs after lets say 10 mins and pools the data.Asynchronous push will require me to maintaing a rabbitmq queue or something which adds to one more failover point.

What should be the best approach async push or pull?


Solution

  • All these approaches work

    Synchronisation with cron task

    • + easy to implement
    • + relatively fast if you use bulk api
    • + easy to change elasticsearch structure (index names, mappings) because it requires reindex entire database and we do it anyway
    • - not realtime update
    • - unnecessary work for unchanged data
    • - you have to support sync script in case you change app logic or data structure

    Synchronisation with hooks

    • + near realtime update
    • + no useless work to reindex everything
    • -/+ depends on your app architecture you have to support hooks
    • - potential risk of data desynchronization (elasticsearch was down when you updated your data), or structure changes -- you still have to support full data sync script

    Use Elasticsearch as primary storage

    • + take from site
    • + scalability is amazing
    • + schemeless structure very helpful
    • - but sometimes you have to specify mappings for dates and setup index settings for specific names. If you not accurate you can "poison" field name with inappropriate datatype
    • + analytic queries are very powerful
    • + bulk api
    • - take here
    • - often updates with feature deprecation, you have to fix app before migrate to new elasticsearch version
    • -/+ no joins

    They all work.