Search code examples
elasticsearchcharacter-encodinglogstashmssql-jdbc

Logstash from SQL Server to Elasticsearch character encoding problem


I am using ELK stack v8.4.1 and trying to integrate data between SQL Server and Elasticsearch via Logstash. My source table includes Turkish characters (collation SQL_Latin1_General_CP1_CI_AS). When Logstash writes these characters to Elasticsearch, it converts the Turkish characters to '?'. For example 'Şükrü' => '??kr?'. (I used before ELK stack v7.* and didn't have that problem)

This is my config file:

input {
    jdbc 
    {
        jdbc_connection_string => "jdbc:sqlserver://my-sql-connection-info;encrypt=false;characterEncoding=utf8"            
        jdbc_user => "my_sql_user"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"    
        jdbc_driver_library => "my_path\mssql-jdbc-11.2.0.jre11.jar"
        statement => [ "Select id,name,surname FROM ELK_Test" ]
        schedule => "*/30 * * * * *"    
    }
    stdin {
        codec => plain { charset => "UTF-8"}
   }
}    

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "test_index"
    document_id => "%{id}" 
    user => "logstash_user"
    password => "password"
  }
  stdout { codec => rubydebug }
}

I tried with and without filter to force encoding to UTF-8 but doesn't change.

filter {
        ruby { 
            code => 'event.set("name", event.get("name").force_encoding(::Encoding::UTF_8))'
        }
    }

Below is my Elasticsearch result:

{
        "_index": "test_index",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "??kr?",
          "@version": "1",
          "id": 2,
          "surname": "?e?meci",
          "@timestamp": "2022-09-16T13:02:00.254013300Z"
        }
      }

BTW console output results are correct.

{
          "name" => "Şükrü",
      "@version" => "1",
            "id" => 2,
       "surname" => "Çeşmeci",
    "@timestamp" => 2022-09-16T13:32:00.851877400Z
}

I tried to insert sample data from Kibana Dev Tool and the data was inserted without a problem. Does anybody help, please? What can be wrong? What can I check?


Solution

  • The solution is changing the JDK version. I changed the embedded OpenJDK with Oracle JDK-19 and the problem was solved.