Search code examples
mysqlamazon-web-serviceselasticsearchrds

What's the best way to feed MySQL table data to ElasticSearch in real-time?


I need to feed MySQL (deployed with RDS on AWS) table data into ElasticSearch in real-time or near-real-time (maybe several minutes of delay), joining a couple tables in the process.

The first option I have investigated is Flink. But after some research I couldn't find a way to stream table data change because the tables are not append-only.

Then I found that some people are talking about CDC(Change Data Capture), basically streaming MySQL binlog changes to a lambda and parse it then post to ElasticSearch, but this just sounds too complicated and error prone.

Is there any industry tried-and-true ways to sync non-append-only tables to ElasticSearch?


Solution

  • You can use the logstash script to fetch data from mysql to elasticsearch.

    Sample Logstash Code

        input {
      jdbc {
        jdbc_driver_library => "<pathToYourDataBaseDriver>\mysql-connector-java-5.1.39.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://localhost:3306/ecomdb"
        jdbc_user => <db username>
        jdbc_password => <db password>
        tracking_column¹ => "regdate"
        use_column_value²=>true
        statement => "SELECT * FROM ecomdb.customer where regdate >:sql_last_value;"
        schedule³ => " * * * * * *"
      }
    }
    output {
      elasticsearch {
        document_id⁴=> "%{id}"
        document_type => "doc"
        index => "test"
        hosts => ["http://localhost:9200"]
      }
      stdout{
      codec => rubydebug
      }
    }