Search code examples
elasticsearchcharacter-encodinglogstashlogstash-configuration

MYSQL to Elasticsearch via Logstash Problem: incompatible encodings: CP850 and UTF-8


I am using an elk stack via docker-compose with ES version 8.4.0

My goal is to use Logstash to copy an entire table from my MYSQL DB to ES. The connection works and Logstash copies about 30 entries with no problems. But then I get a long error message.

[2022-09-10T18:41:26,318][ERROR][logstash.outputs.elasticsearch][main][757e3825fce0788f949869472d03e028630de9d063200717b56bc9ceefe29d81] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in write'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in block in bulk'", "org/jruby/RubyArray.java:1865:in each'", "org/jruby/RubyEnumerable.java:1143:in each_with_index'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in safe_bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in submit'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in retrying_submit'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "D:/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in `block in start_workers'"]}

I suspect this error is the reason: {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError

My config file look like that:

  jdbc {
    clean_run => true
    jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar" 
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/posts" 
    jdbc_user => "sqluser"
    jdbc_password => "sqlpassword"
    schedule => "* * * * *" 
    statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added 
    FROM telegram.channel_results where id >:sql_last_value;"
    use_column_value => true
    tracking_column => "id"
    
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "posts"
    user => "username"
    password => "password"
  }

  stdout {
    codec => rubydebug
  }
}

I notice that if I removed the text column from the query, the process runs without any problems. In my database, the text column is of SQL type text. I suspect an encoding problem because there are also Russian texts and emotes included. I need a solution to also copy the texts in ES. Maybe it is a encoding problem with emotes and other characters in the text ?!


Solution

  • Try this below charset encoding input filter.

     jdbc {
        clean_run => true
        jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar" 
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_connection_string => "jdbc:mysql://localhost:3306/posts" 
        jdbc_user => "sqluser"
        jdbc_password => "sqlpassword"
        schedule => "* * * * *" 
        statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added 
        FROM telegram.channel_results where id >:sql_last_value;"
        use_column_value => true
        tracking_column => "id"
        columns_charset => {
                "text" => "ISO-8859-5"
        }
      }